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INTRODUCTION 


The  problem  of  classification  is  to  correctly  associate  one 
population  TTq  with  exactly  one  of  several  distinct  populations 
Tf, , . ..,  Tf  . The  problem  may  be  to  classify  a single  unit  or 
more  than  one  unit  coming  from  TTq.  The  distribution  functions 
which  characterize  the  populations  are  not  completely  known. 
Sometimes  the  density  functions  are  known  except  for  some  un- 
known parameters.  In  a broader  problem  the  distribution 
functions  are  not  given  explicitly  in  simple  parametric  forms. 

In  order  to  get  more  information  on  the  distribution  functions, 
data  in  .the  name  of  a "training  sample"  are  collected  in  one  of 
the  following  ways  (depending  on  the  situation): 

(a)  Separate  samples  from  different  populations. 

(b)  Sample  from  the  population  which  is  a mixture  of 

ffl 

When  the  density  functions  are  known  except  for  some  para** 
meters,  a plug-in  rule  is  obtained  by  replacing  the  parameters 
by  the  corresponding  estimates  (generally,  maximum  likelihood 
or  some  other  consistent  estimates  are  used)  in  the  optimal 
rules  according  to  some  specified  criteria  in  a given  class. 

One  may  consider  Bayes  rules,  minimax  rules,  admissible  rules, 
etc.  Asymptotic  properties  of  most  of  these  rules  are  not 
difficult  to  obtain,  but  asymptotic  expansions  of  probabilities 
of  misclasslfication  (PMC)  would  be  more  useful.  In  Chapter 


one,  we  consider  the  problem  of  classifying  one  unit  to  one  of 
three  distinct  multivariate  normal  distributions  with  a common 
but  unknown  covariance  matrix.  A plug-in  rule  is  obtained  by 
substituting  the  estimates  of  the  parameters  in  the  minimum 
distance  (Mahalanobis  distance)  rule.  Anderson  (1973) 
obtained  a similar  result  when  m=2.  Following  T.W.  Anderson 
(1973)»  we  derive  the  asymptotic  expansions  of  the  PMC's  and 
the  estimated  PMC's  of  this  plug-in  rule  with  an  error  of  the 
order  of  the  square  of  the  number  of  observations.  No  such 
results  are  available  in  the  literature  for  more  than  two 
populat  ions . 

When  density  functions  are  completely  unknown,  estimates 
of  density  functions  are  used  to  obtain  a plug-in  rule  for  a 
given  rule  which  involves  density  functions.  In  1951,  Fix  and 
Hodges  proposed  a classification  rule  for  the  two-population 
problem  based  on  nonpar aroe trie  estimates  of  the  density 
functions.  The  K-nearest  neighbor  (K-NN)  rule  thus  proposed 
by  Fix  and  Hodges  is  described  as  follows:  Let  (X^; 

j*l,...,n^}  be  a random  sample  from  the  ith  population.  Con- 
sider a distance  function  d and  order  all  the  values 
d(X1Jt  2)  (2  is  the  observation  to  be  classified.),  jal,...,n^i 
1*1,..., m.  The  K-NN  rule  assigns  Z to  the  population  TT^, 
if  K^/n^  * max  /n^ , where  is  the  number  of  pbservatlons 

from  TTj  in  the  K observations  "nearest"  to  Z.  They  obtain 
the  exact  and  asymptotic  expressions  for  the  PMC  of  the  NN  rule 
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when  K b 1,  The  1-NN  rule  was  also  studied  by  Cover  and  Hart. 
Cover  and  Hart  (1967)  considered  the  mixed  population  case  and 
proposed  a K-NN  rule  which  assigns  Z to  the  population  TT^, 
if  K.  = max  K,.  In  a recent  paper,  Goldstein  (1972)  has 

j J 

studied  some  asymptotic  properties  of  the  Kn~NN  rules  and  ob- 
tained a consistent  upper  bound  for  its  PMC.  In  Chapter  two, 
we  propose  some  rules  which  use  the  basic  ideas  of  the  NN 
rules,  but  are  expressed  in  terms  of  their  ranks,  when  the  ob- 
servations are  available  only  in  their  relative  orders  or 
ranks  and  the  usual  NN  rules  can't  be  applied.  However,  it 
may  be  noted  that  the  density  functions  can't  be  estimated 
using  the  ranked  observations  only.  The  asymptotic  PMC's  of 
these  rules  are  derived  and  when  sampling  from  a mixed  popula- 
tion is  considered,  asymptotic  risks  are  obtained  as  well. 

The  asymptotic  risk  of  the  modified  1-NN  rule  is  the  same  as 
the  respective  asymptotic  risk  of  the  1-NN  rule.  The  asymp- 
totic risk  of  the  modified  K -NN  rule  turns  out  to  be  exactly 

n 

the  Bayes  risk. 

Another  class  of  rules  are  suggested  based  on  U-statistics . 
Das  Gupta  (1964)  proposed  a rule  based  on  Wilcoxon  statistics. 

He  showed  that  such  a rule  is  consistent.  Hudimoto  (1964) 
also  used  Wilcoxon  statistic  when  jFjdFg-^  > 0 and  derived 
some  bounds  for  the  probability  of  error.  Chanda  aid  Lee  (1975) 
modified  Hudimoto's  rule  to  the  situation  when  ». 'ther 
JVjdF  >0  or  jF^dF  < 0.  We  shall  use  Hudimoto's  idea 


and  suggest  a two-sided  classification  rule  based  on  the 
Lehmann  statistic  (Lehmann,  1951 )•  Asymptotic  results  are  of 
theoretical  interest;  however  good  studies  on  the  rate  of 
convergence  will  be  useful.  Following  Grams  and  Serf ling  (1973) 
in  their  study  of  convergence  rate  for  U-statistics , we  obtain 
the  asymptotic  PMC’s  of  these  rules,  together  with  the  rate  of 
convergence  when  the  sizes  of  the  training  samples  approach 
infinity.  The  strong  consistency  of  the  rules  are  also 
pointed  out. 

Finally,  we  consider  sequential  rules  in  order  to  attain 
prescribed  probabilities  of  error.  Hoeffding  and  Wolfowitz 
(1958)  studied  the  problem  of  distinguishability  of  sets  of 
distributions.  Later  the  notion  of  distinguishability  was 
used  by  Das  Gupta  and  Kinderman  ( 197*0  in  the  set-up  for  the 
classification  problems.  Hoeffding  and  Wolfowitz  (1958)  in- 
troduced the  minimum  distance  test  procedure  and  studied  the 
properties  of  this  test  using  the  available  probability  bounds 
on  sample  distribution  function.  In  Chapter  four,  we  shall 
introduce  the  minimum-U  sequential  rules  and  prove  some  pro- 
perties of  these  rules  by  using  the  available  probability 
inequality  for  U-statistlcs.  Srivastava  (1973)  considered 
sequential  rules  for  classification  into  one  of  two  distinct 
multivariate  normal  distribution  with  means  ^ and  a 

common  covariance  matrix  7j  in  the  following  two  cases: 

(l)  “ft  is  known  but  S is  unknown,  (ii)  Both  6 
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and  S are  unknown.  For  the  case  (i)  Srivastava  proposed  a 
sequential  rule  based  on  observations  from  TTq  and 
given  or  he  showed  that  the  PMC's  of  this  rule  tend  to 
values  less  than  or  as  6*  S 6 ->  0.  However  Srivastava's 
proof  is  incomplete  and  suffers  from  a technical  error.  We 
shall  present  a more  rigorous  analysis  of  his  rule. 
Srivastava  also  proved  that  for  his  rule  in  case  (ii)  the 
error  can  be  controlled  arbitrarily  as  6'  £ -16  ->  0.  But 
his  proof  is  entirely  wrong  and  we  shall  indicate  his  error. 
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CHAPTER  I 


CLASSIFICATION  INTO  ONE  OF  THREE  MULTIVARIATE  NORMAL 
DISTRIBUTIONS 


1.0  Introduction 

A random  observation  X is  drawn  from  N^(^,  £).  The  prob- 
lem is  to  classify  this  distribution  into  one  of  ^p(u^»  S)» 

Tj)*  and  Np(^>  lfc  is  assumed  that  and 

are  distinct  and  £ is  nonsingular  (see  T.  W.  Anderson  (1958), 
Chapter  6).  When  y^,  y,g,  y,^  and  £ are  known,  and  the  costs 
of  misclassification  are  equal  then  under  the  assumption  that 
drawing  a new  observation  from  each  population  is  equally  likely 
the  optimal  classification  rule  (minimizing  the  expected  loss 
from  cost  of  misclassification)  ft  decides  y a H,  iff 

(X-y,  ) ”1(X-^  ) = min  (X-y . ) *£  'V-y . ) , which  may  be 

j=l,2,3  J J 

written  as 


(1.1)  Utj  » (X-i(ui+l*j))'S"1(lii-UJ)  > 0 and 

Uik  * (x'Ku1+l*jc))'S  > ° 

where  i,  j,  k « 1,  2,  3*  1 f j.  j f k,  k ^ i. 


To  compute  the  PMC's  of  this  rule,  let  us  assume  that 
a iu  . Then 


V>  - ^ 


(1.2)  Tf2l  = Pr(6  decides  H = ■ ^) 

= Pr(U2l>0,  U23>0|u  = ^) 

Let 

(1.3)  ~ (p>^“|ij) *2j  i»j  = 2,3. 

Then 

(1.4)  Tt2l  = Y(«,  3i  p), 

where 

(1.5)  « = ^22»  P = £(«  gg "«  33 (°f  22+0f  33"2er  23^* 

(1.6)  p - (of22-a  23^/[a  22^  22**  33'^  23^ 

and 


00  00 

(1.7)  y(a,P;p)  = J J1  q>2(u,v;p)dudv, 

a P 

92(.».»p)  being  the  pdf  of  the  bivariate  normal  distribu- 
tion with  zero  means,  unit  variances  and  correlation  coefficient 
p.  The  PM3  Tf^  can  be  obtained  by  interchanging  the  subscripts 
2 and  3 in  the  formula  for  TT^. 

But  in  most  applications  the  parameters  are  not  known  and  a 
training  sample  from  each  population  is  available: 


Estimates  (based  on  training  samples)  of  the  parameters  are 


substituted  in  (l.l).  To  get  a rule  called  a plug-in  version  of 


! 0 

II 

| u 
0 
0 


0 

0 

0 

D 

D 

0 


6 we  estimate  ^ by 
1 ”l 

(1.8)  1=1.2. 3 

and  S by  S , where 

3 ni 

(1.9)  (n.-m  +n  -3)s  = £ ^ (x  -x. )(x  -X. ) ' . 

* J i=l  «y=l  W ® 1 

A 

Then  the  plug-in  minimum  distance  rule  6 decides  |±  = iff 

(1.10)  (X-|(Xi+Xj)),S_1(Xi-Xj)  > 0 and 

(x-ft^+x^) ) ,s-1<x±-xk)  > o. 

In  this  chapter,  we  obtain  asymptotic  expansions  of  the 
PMC's  and  the  estimated  PMC's  of  the  plug-in  rule  with  an  error 
of  the  order  of  the  square  of  the  number  of  observations.  No 
such  results  are  available  in  the  literature  for  more  than  two 
populations.  Anderson  (1973)  obtained  similar  results  for  the 
two-population  problem. 

1.1  The  asymptotic  expansions  of  PMC's. 

The  PMC  s of  the  plug-in  rule  6 will  be  derived  now  under 
the  assumption  p = . 

(1- 11)  ?2l  = Pr(&  decides  p,  = p,2|n  = Hx) 

- Pr(U21  > 0,  U23  > o|p,  « nL), 


A 

where  Us  are  obtained  from  the  corresponding  U's  after  re- 
placing Pg,  p^,  7j  by  X^,  Xg,  X^,  S,  respectively.  Condi- 

tioning on  X^'s  an<*  S we  get 

(1.12)  p2l(Xr  X2,  X . S) 

s y(-(i*1-^2+x1))'s‘1(x2^1)/[(x2-x1),s‘1  Ss"1^)]^, 
-(^1-Kvx3))’s-1(x2-x3)/[(x2-x3)'s“1  Tj  S^Ofg-X^)  A 
(x^x^’s-1  s s"1(x2-x3)/[(x2-x1),s“1  s s-1^) 
(x2-x3)'s-1Ss“1(x2-x3)]^). 


For  simplicity,  we  shall  assume  n = n^ 
now  on  write 


n_ 


and  from 


(1.13)  m = n1+n2+n3«3  = 30-3. 

The  distribution  of  (U^,  U^)  ^nvar^ant  with  respect  to 
* * 

the  transformations  X = AX+b,  X^  = AX^+b,  j = l,...,n; 
i = 1,2, 3»  where  A is  nonsingular.  Without  loss  of  generality, 
we  shall  replace  p,g,  ^ and  T>  by  0,  Tig,  T)3  and  I, 

respectively,  where 

(1.14)  t]2  = U^Cpj-pg),  i)3  » ZT^pj-y^). 

Then 


33 


(1.15)  9 22  “ 


<* 
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Define  Y gt  Y3  and  V by  ' 

(1.16)  Xx  = Yj/m^,  X2  = -T12+Y2/m^,  5^  = -1)3+Y  /m* 

S = I+V/iA 

The  statistics  X,  Xy  X^,  Xy  and  S are  independently  distri- 
buted as  Np(nr  T)y  Np(^2,  V/n),  Np(j*  , TJn),  and  W(S,  m), 
respectively.  Combining  these  and  the  transformations  mentioned 
above,  we  can  assume  that  X,  Y^  Yg,  Y3  and  V are  mutually 
independent  and  X ~Np(0,  i);  Y^  ~ Np(0  ml/n),  i a 1,2,3;  and 
ev  « 0.  Then,  in  terms  of  Y’s  and  V,  (1.12)  is 

(1.17)  *al(vw>  $Y((v  VrJ- 

where 

(1.18)  Gffl  - ^[-Tl2+(Y2+Y1)/m^],(l+V/m^)"1[-712+(Y2-Y1)/m«]/ 

{[-Tl2+(Y2-Yi)/m^*(l+V/m^)'2[-712+(Y2-Y1)/m^]}^, 

(1.19)  bm  = i[-(Tl2+713)+(Y2+Y3)/m^]'(l+V/m^)"1 

C-(Tl2-,,l3)+(Y2-Y3)/m^/(t-(712-Tl3)-f-(Y2-Y3)/m^r 

(l+V/m^)”a[-(Tl2-n3)+(Y2-Y3)/o^])i 

(1.2°)  rffla  [-Tl2+(Y2-Y1)/m^»(l+V/m^)-2[-(H2-Tl3)+(Y2-Y3)/m^/ 

(f-V(Y2‘Yl)/m^^I+V/m^”af-V(Y2‘Yl)/o^ 

t-(Tl2-Tl3)+CY2-Y3)/m^'(l+V/m^)“a[-(Tl2-7)3) 

+(Y2-Y3)/m^)i 
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(1.21)  (i+V/m^)-1  = I-V/m^+V2/m3/2-V3/m3/2+V^/m2+V5(l+V/m^)“1/m5^ 

(1.22)  (1+V/mV2  = I-2V/m^+3V2/m-4V3/m3/2+5V1+/m5/2 

-(6V5+5V6/^)(l+V/o^)“2/m5/2 

Let  be  the  subset  of  the  sample  space  defined  as 

(1.23)  Jffl  = (lYkjl  < g(log  m)^:,  (v^l  < 2 log  m;  k = 1,2,3, 

i»J  - 1»  ...»P,  is  a constant  greater  than  4}, 

where  Yfc  = (Y^,  Ykp)'.  Lenina  of  Anderson  (1973) 
yields  the  following: 

Lemma  1.1  Pr(Jffl)  = l-o(m”2). 

Consequently,  since  0 < ^(G^jb^jr^)  < 1 we  have 

(1.210  p21  - - w(oB.ba;riD)x(jn) 

♦ e,<'vlVr„)x(JmC> 

- CT(Oa.bm;rn)x(Ja)«)(»-=). 

where  x(a)  stands  for  the  indicator  set  function  of  a set  A. 


Define 


t 

I 
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(1.27)  (TlgV  -it-27l2V712*'2T12(Y2‘Yl)] 

m~ 

i 4 

+ m[^2V%^(Y2-Yl)+(Y2'Yl),(y2"YI)+Y2m(Yl*Y2’V)]) 

- -t  + 4ni:(Y  -y.J+iU^pJ/c372 

G2  m2  2 2 1 22 

+ “( t ^V2H2^V(Y2-Y1)+(Y2-Y1)  ’ (Y2-Yx)  ]/2G3/2 
+3[^(Y2-Y1)+7]^VT|2]a/2G5/2}+Y3ni(Y1,Y2,V), 

where 

Y2m(WV)’  Y3m(Yl’Y2’V)  have  the  same  properties  as 
Ylm^Yl’Y2’V^‘  The  notation  Yjm(Y1,Y2,Y3,V)  will  be  used  fre- 
quently, which  will  have  the  same  properties  as  those  of 
Ylm*  unless  mentioned  otherwise. 

Combining  (1.26)  and  (1.2?),  we  get 

(1.28)  Gm  = a+C/m^+D/nH-Y4m(Y1,Y2,V), 
where 

(1.29)  C = “^(Yg+Yj  )/2G^f 

D = [ -•^Van2+h7|^VY1+(Y2+3Y1 ) ’ (Y^-Y^ ) }/k£ 

+ [ (T)2V\)a-^2VT12Tl2Y1-Tl2(Y2+3Y1)(Y2-Y1)  ,T123/4g3/2* 

The  numerator  of  b^  in  (I.I9)  is  ^ times 

(1.30)  e + -ic  - ( Hg+'H 3 ) ' V (Tl2 -71 3 ) -2 (*n^Y2 -*n^Y3 ) 3 

+ £[(VV  ,v2(7l2-H3)+2(^vY2-ll'VY3)+(Y2+Y3)  '(Y2-Y3)  ] 


+ WVV)* 


The  denominator  of  bQ  in  (1.19)  is 


(!•»)  5 * ^(<VV'(VV+(VV'V(VV1/o3 

01 

+ ~{-t3(\-Tl3),V2(Tl2-T|3)+(Y2-Y3),(Y2-Y3) 
+4(T12-T)3)'v(Y2-Y3)]/2a3 
+31  ' (y2-y3)+(ti2-1)3)  ’v(VT)3>  3s/2*5) 

^6m<Y2’Y3’V>- 

Combining  (I.30)  and  (1.31),  we  get 

(1.32)  bm  « 3+F/mWmfYTm(Y2,Y3,V), 
where 

(1.33)  F - -[(T12+T)3),V(T)2-713)+2(T1^Y2-T1^Y3)]/2c 

+et(Yl2-Tl3)/(Y2-Y3)+(ll2-713),V(Tl2-Yl3)]/2cT3 
G - -e[3(H2-Tl3),V2(Tl2-‘n3)+(Y2-Y3),(Y2-Y3) 

+^(VT*3)  'V<  VY3)  t Ol2-V  ’ (Y2"Y3) 

+(T12-*n3)  ’V(T12-T)3)  ]2/W5+[ (Tlg+Tl3)  ’V2(T12-H3) 

+2(T1^VY2-71  •VY3)+(Y2+Y3)  * (Y2-Y3)  ]/2o 

-«VV  '^VV^VV ' < VV ,[  (VV  'v<VV 

+2(T12Y2-713Y3)  J/2o3 
The  numerator  of  r in  (l .20)  is 
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(1.34)  f + -T^V 0l2-T13 ) -1)g (y2-y3)- (ti*-h3) ' ( y2 -Yx ) ] 

+ ^3H2V2(T12-T13)+2T^V(Y2-Y3)+2(T12-T13)  ’vfrg-Y^ 
+(Y2-Y1)'(Y2-Y3)]+  Y8ra(Y,,Y2,Y3.V). 

Combining  (l.34),  (l.27)>  and  (l.3l),  we  get 

(1.35)  rm  = p+H/m^+K/mFy9m(Y1,Y2,Y3,V), 
where 

(1.36)  H * -[2T^V(712-Tl3)-^(Y2-Y3)-(T12-T]3),(Y2-Y1)]/G~a 

+f  [Dgt  VYi  )+T12V7)2 1 

+f  [(*n2-‘n3)  • (y2-y3)+(ti2-ti3)  •v(t)2-ti3)  j a&3 
+4(H2-T)3)  'v(Y2-Y3)  ]/2gM 

+3f  t (V713)  ’ ( VY3)+(  VV  'V<VV  Je/2G^5 

+[3Tl^V2(U2-‘n3)+2^V(Y2-Y3)^(Tl2-Tl3),v(Y2-Y1) 

+(y2-y1),(y2-y3)]/g^o 

-f  [ ^V^g+fYg-Yj  ) ’ (Y2-Yx  )+4^V(Y2-Y1 ) ]/2G3/2a 
+3f  [^(Yg-Yj  )+71^VTV2  ]s/2G 5/2a 
+f  [T^(  Y2-Yx  J+T^VTlg  ] [ (Tt,-^) ' (Y2-Y,) 

+(ti2-t)3),v(ti2-t)3)]/g3/2o3 


$ 


+[2^V(T12-T13)+^(Y?-Y3)+(T12-T13),(Y2-Y1) ] x 

i <vV ' ( vv^vV  *v<vV  1/G^<y3 

-[2D^V(T]2-,n3)+Tl^(Y2-Y3)+(Tl2-,n3)  ' (Y2-Yx)  ] x 
t^(Y2-Yi)+71^Vlfl2]/G3/2a. 

We  assume  that  |p|  ^ 1.  This  means  that  ^ are  not 

linearly  related. 

Then  a Taylor  series  expansion  of  ^(^n»*,m»rm)  °ver  for 

sufficiently  large  m (see  appendix  for  detailed  derivation) 
gives , 

(1.37)  YCCvVrJ 

■ Y(«,0  ;p  )-HP1(«)*1(g^i/2t)  [ -C  /m^+c^c2  /hm-D  /m  ] 
■Hpi(p)li(f/2aT)  [eF2Amff-F/m^-G/m] 

■KPX  («)q»1(-G^rq/2T)  (G^ct/t)  [ -fC2/2mG^<r»CF/2BHG"?CH/4mT2  ] 

(p  )q>j  ( - 5/20  T ) (G^a  / T ) [ - fF  2 /2tnG^a+CF /2m+GcrqFH /Umt2  ] 

•*tp2(a,e;p)  (H/m^+K/nri-  (GCT/2T2  ) [ f /G^+G^q  ?/4  T2  ]H2/m 

+(G^?A t2 )CH/»f (Gqa /4 t2 )FH/m)  + y10  (Y1,Y2,Y3,V)/bi3/2 

♦ YU  (YrY2,Y3,V)+  Y12.(Y1,Y2,Y3.V). 
x 

where  »x(x)  ■ J «P1(y)dy;  and 

<p^(0  being  the  pdf  of  the  standard  normal  distribution;  and 
Y10  (YrY2.Y3,v)  is  a homogeneous  polynomial  (not  depending  on 
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m)  of  degree  3 in  the  elements  of  Y^,Yg,Y^,  and  V; 

Y11  ^Y1,Y2,Y3,V^  18  a of  degree  4,  and 

Yl2m(Y1,Y2>Y^,V)  is  a remainder  term,  which  is  0(m"-^2)  for 


fixed  Y. ,Y  ,Y  , and  V in  J . 

i d 3 m 


Since  J is  by  definition  symmetric  in  Y, ,Y„,  and  Y , 
m 12  3 


C has  the  expectation  zero  over  J . Let  h be  a function  of 

m 


Y1’Y2’Y3’  and  Y having  finite  second  moment.  Then 


(1.38)  |£h-ehx(Jm)l 


= |8hx(JmC)l 

< |a.=|i|ex(J„c)l* 


oCm"1). 


Consequently,  the  differences  between  £D/m,  eG/m,  eK/m,  0C2/m, 

PPS/m.  PH2/m  PCV/m  PC.Ufm  PV\ l/m  *S/  (v  V V W^/m3/2 


eFa/m,  £H2/m,  £CF/m,  fiCH/m,  eFH/m,  £v10  and 


the  corresponding  expectations  over  Jffl  are  o(m~2) . Moreover, 


for  any  positive  integer  t. 


(1.39)  “ mt  " O(nT^) 


Bence 


(1.40)  |eF/m^-CFx(Jm)/m^| 

- \eF%(Jnc)/J\ 

< (i/m^)|eF3|1/3|ex(Jmc)l2/3 

- (l/«^)|0(m^))l/3(0(«“2))2/3 


o(«'a). 


0 fi 


E 

0 

D 

D 

0 

0 

0 

1 

0 

0 

0 

D 

0 

D 

0 

0 

0 

0 

0 
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Similarly, 

|eH/m^-eHx(Jm)/m^|  < o(m"a). 

Note  also  that  eF  = 0,  £H  a 0,  Since  the  fourth-order  absolute 
moments  of  Yi»Y2»Y-j»  311(1  v exist  and  are  bounded,  so  is 

Hence  U/nfOeviiC*!  *Y2»Y3»v)x{Jm)--°(m"2). 
Finally,  in  Jffl  each  element  of  Yj.Yg.Y  , and  V divided  by 
m^  is  less  than  a constant  times  log  m/m^,  therefore 

(1^D  e|Yl2n(Y1.Y2,Y3,V)x(Jin)|  . 0(m-5/2log5m)  = o(m'2). 

Thus 

(1.42)  ©'(Gta*bni;rm^Jm^  “ ^Cor^Jp )-Kj/nH-eY10(Y1,Y2,Y3,V)/m3/2 

+0(m”2), 

where 

(1.43)  Q = 91(*)«1(C^q/2T)C[C^8/4-D]-»Vl(p)91(5/atrT)C[rf2/4<y-cJ 

+*1  (a)<P1(-6-q/2T)(G^r/T)e[  -fCa/2G^j+CF/2-G^?CH/4T2] 

■^1  (P  )<Pj_  ( “ ?/2o  t)  (G^t/  t Jef-fF2  /2G^ej+CF  /2+Gbq  FH/4  t2  ] 
■Kp2(«,P  ;p  )C{K+Q[j/2T2[f/G^-Q^q5/4T2  3Ha+Q^^cH/4T2 
+GjqFH/4ra) 

Since  the  third  moments  of  Yj.^.Y  , and  V are  either 
sero  or  0(m“*),  combining  (1.24)  and  (1.42),  we  have 
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(1.44)  P2l  = Y(#,p;p)-K}/m+0(nf2). 

Finally  we  have  (see  Appendix  for  details), 


( 1.4 5)  p21  = *(®.p;p) 

+ “|cp1(«)$i(G^q/2T)  [(2p+l)G^/8+3(p-l)/2G?] 

-KPl(p)$l(?/2aT)(e/4<7)  [(p-l)/4+(,r2+3(G+b)+6(p-l))/cr2 
-3  (G-b  )2 /2ct^  ] - ( 3 78^  (cr  ^ (3 ) ( f -d  ) / t 


-Kp^P)^  -|/2ot)  (l/<ro2)  [ -if  ( -t2-3e2/2or2)+(  1/8)  ( -9G2 
+8Gq+5b2+4d2)  ]+cp2(a,p  ;p  )((l/c¥a)  t3(p-l)-3(p-l)q/G 
-3(p-l)f/CT2-2f+f2(3+2f)/GCT2]+?/8G'2CT3 
+q(llQ+5b-4d  )/8G%3+(l/aT2  ) [ f /G^+G^cAt^  ] [ -4f  2 
- 3f - 3f 2/G-3f 2/CT2+3a2+  3G+2G&2 


-2f 3( 3+2f ) /Go2  3 } [ +0(m“ 2 ) . 


The  asymptotic  expansL  on  of  the  PMC  P^  can  be  obtained  by 
interchanging  the  subscripts  2 and  3 in  (1.45). 


1.2  The  asymptotic  expansions  of  estimated  PMC's. 

We  estimate  P^  by  considering  X distributed  as 
Hp(X^,S).  Then,  in  terms  of  Y's  snd  V,  is 

(1.46)  Pgl(VVY3,V)  - 


where 
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U.U7)  Gfa  - |{[-V(Y2"Yl)/m^,(l+V/m^‘1[-V(Y2"Yl)/m^3)^ 

(1.48)  bm  = i-[-(T]2+T13)+(Y2+Y3-2Y1)/m^]'(l+V/m^)"1[-(T12+Tl3) 

+(Y2+Y3-2Y1)/m%)/a-(Tl2-Tl3)+(Y2-Y3)/m^,(l+V/m^)-1x 

[-(T12-H3)+(Y2-Y3)/m^]}i 

(1.49)  rm=  t-Tl2+(Y2-Y1)/m^],(l+V/ra^)'1[-(T]2-T13)+(Y2-Y3)/m^]/ 

{ [ -Tlg+C' Y2-yj  ) /mh  ' (1+V/m^) _1  [ -T^Y^  ) /m*]  [ - (T^) 

+(Y2-Y3)/«^]'(l+V/m^)“1[.(Tl2-Tl3)+(Y2-Y3)/m^])i 

As  before,  over  Jm  for  sufficiently  large  m,  Taylor  series 
expansions  give: 

( 5 ) Gm  Of  /ra  h D /ra  + y13»(Y1’Y2’V)- 

where 

(1.51)  C*  = -(Tl^VTl^Tl^Yg-Y^jAci 

D*  = [Tl^VaT)2+2Tl^V(Y2-Y1)+(Y2-Y1),(Y2-Y1)]/4G^ 
-[^VT)2+2^(Y2-Y1)  ]2/16q3/2. 

(1.52)  bm  = P + F*/m*  +G*/m  + Yl4m(VY2’Y3’V)’ 
where 

(1.53)  F*  = [-T)^VTl2+Tl'VTl3-2Tl,2(Y2.Y1)+2Tl’(Y3-Y1)]/2a 

+e  1 0l2-T)3)  ,V(T12-Tl3)+2('n2-713) ' (Y2-Y3)  ]/4a3. 
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G = -e[(Tl2-T)3),V2(712-713)+2(Tl2-Tl3)'v(Y2-Y3)+(Y2-Y3)' 
(Y2-Y3)]/4a3 

+3e  t 012-1)3)  'v(712-T13)-^(T12-T13)  • (Y2-Y3)  ]2/l6a5 
+f-Tl2VTl2+'n3VTl3-2^(Y2-Y1)+2'n^(Y3-Y1)  ] x 

[(1)2-113)  ‘^VV^VV ' ( VV  ]/2°3- 

^ ^ ) m ^ l K/  Y15»<Y1’W>- 


(1.55)  H*  = f[^VT12+2^(Y2-Y1)]/2G3/V[-T)^VTl2+Tl^VTl3 

“UgO VY1  )+112  (Y3"Y1  )+T13 ( Y2'Y1 > ]/G*a 

+f  1 0l2-113)  'V(T)2-T13)^(T12-Tl3)  * (Y2-Y3)  ]/2G^a3 

K = V2  (T|2  3 ) +2T1^V  ( y2  - yi  ) C Y3  - Yi ) -^3  C Y2  “Yi ) 

+<y2-yi  ) • ( y2-yi  ) “ ( y3“yi  ) * (VY1>  ]/g^ct 
-f  [(1)2-113)  '^VV^VV  'V(Y2"Y3)+(Y2-Y3) ' ( 
2cV+3f  [(1)2-1)3) ' V (TI2 -H 3 )+2 (Tl2 -1) 3 ) 'V(Y2-Y3)  ]2/ 
8G^a5-f  [1)2V2112+211^V(Y2-Y1)+(Y2-Y1) ' (Yg-Y^  ]/2G3/2<J 
+3f  [DgV > )s/8G5/2^+f^ [ (1)2-113)  ,v(D2-T13)+ 
2(D2-1)3) ' (y2-Y3)  3 [Tl2Vn2+21f,2(Y2‘Yl J 3/2G3/2o3 
+[-1)2V(l)2-1)3)-2l)2(Y2‘Yl)+^2(Y3‘Yl)+^3(VYl)]  X 
[ (i)2-D3)  ^012-1)3)^  (D2-1I3) ' (y2_y3)  3 /2C&3 


J 
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+ [^2V^2^3)"2^2(Y2_Y1)+^2(Y3"Y1)+713(Y2"Y1)]  X 
[D^VT)2+2T1^(Y2-Y1)  ]/2G3/2ct. 

By  going  through  exactly  the  same  arguments  as  in  Section 
1.1,  if  | p | < 1,  we  have 

(1.56)  eP21(Y1,Y2.Y3,v) 

= Y(a,p;p) 

+ ^jcp1(a)$1(G^q/2T)[G3/2/32-(p-l)G^/4-3(p-l)/2G^] 
•«P1(p)$1(?/2aT)(e/Ua)t(2p+l)/2+(e2+8T2+48(G+b-d) 
+48(p-l))/8a2-3e2/2ai|]-cp1(a)cp1(-G2q/2T)(l/T)  x 
[|f(G/8+3/2)+(2(G2-ds)+l2f+ef(f+6)/c2)/l6 
“(G(f+6)^/l6'r2)(l-f2/Qj2)  ]+q>^(p )<p^(-^/2aT) (l/ t)  X 
{-(f/2)[(e2+8i2+48(G+q))/Sa2-3e2/2a3]+[2(G2-d2) 

+12  (G+f ) +e  f ( f +6 ) /a2  ] / 16h- (Gq  / 16  t2  ) [ -G2+Gd+ 3Gb-!-bd 
-4d2+6(G+3b)+2df (d+6)/G-l2ef/a2 
+ef2(f+6)/Ga2  ])+q>2(a.P  ;p ) ( [-l2-3(p-l  )f /a2 
-( 3(P"1 ) f+6d ) /G+f  2(  f +6) /Go2  ] /G2a 
+ ( 1/2  T^a  ) [ f /G^+G^q  T2  ] [ - f 2+6(  b+d ) -6d2/G 

-6f2/a2+f2(f+6)/G£r2]+(G^/l6T2a)(l-f/GCT2) 
+(G^q/l6'fia)  [ -GT +Gd+3Gb+bd-4d2+6(G+3b  ) 


+2df (d+6)/G-l2ef/a2+ef2(fi6)/Ga2]} }+0(m~2). 
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Interchanging  the  subscripts  2 and  3 in  (1.56)  gives 
the  expansion  for  . 


- 24  - 

CHAPTER  2 

ASYMPTOTIC  PMC'S  OF  NEAREST  NEIGHBOR  RULES  BASED  ON  RANKS 
2.0  Introduction 

The  nearest  neighbor  (NN)  rule  for  classifying  an  observa- 
tion Z into  one  of  two  given  populations  TT^  and  TTg  was  first 
introduced  and  studied  by  Fix  and  Hodges  (l95l).  The  rule  can  be 
described  as  follows:  Let  (X- X ) and  (Y, , Y ) be 

1 ttj  1 Hg 

random  training  samples  from  TT^  and  TTg,  respectively.  Using  a 
distance  function  d,  rank  the  distance  of  all  the  observations 
from  Z,  and  classify  Z into  TT^,  if  the  nearest  observation  to 
Z comes  from  17^.  This  rule  was  also  studied  by  Cover  and  Hart 
(1967) » Cover  (1968),  who  generally  considered  sampling  from  a 
population  which  is  a mixture  of  TT^  and  Tig. 

The  NN  rule  can't  be  applied  if  the  observations  are  avail- 
able only  in  terms  of  their  ranks  (or  relative  orders).  In  this 
chapter  a rule  is  suggested  which  uses  the  basic  idea  of  the  NN 
rule  but  it  is  expressed  only  in  terms  of  the  ranks  of  the  obser- 
vations. The  rule  is  given  below  and  it  is  termed  as  the  Modified 
Nearest  Neighbor  (MNN)  rule. 

Pool  all  the  observations  Z,  X^'s,  and  Y j * s , and  note 
their  relative  orders  (or  ranks).  Let  U and  V be  the  nearest 
observations  to  Z from  the  left  and  from  the  right,  respec- 
tively, in  the  pooled  training  sample.  When  either  U or  V is 
not  defined  we  define  it  to  be  Z.  The  MNN  rule  can  now  be 


- 25  - 


described  as  follows: 

(i)  If  Z is  the  smallest  observation,  then  classify  it  to  TT^ 
(or  TTg)  when  V is  an  X (or  Y)  observation. 

(ii)  If  Z is  the  largest  observation,  classify  it  to  TT^  (or 
TTg)  when  U is  an  X (or  Y)  observation. 

(iii)  If  both  U and  V are  X (or  Y)  observations,  classify  Z 
into  1T^  (or  TT0). 

(iv)  If  U and  V are  not  from  the  same  population,  classify  Z 
into  TT^  and  TT^  with  probability  ^ and  3r,  respectively. 
The  basic  idea  of  this  rule  is  taken  from  Anderson  (1966)  where 
he  discusses  classification  rules  based  on  tolerance  regions.. 

In  Sectiai  2.1  we  have  derived  the  asymptotic  (as  n^, 

-*  00  ) PMC  of  the  MNN  rule.  It  turns  out  that  the  asymptotic  PMC 
of  our  rule  is  the  same  as  the  respective  asymptotic  PMC  of  the 
NN  rule  as  derived  by  Fix  and  Hodges  (1951). 

Moreover,  to  reduce  randomization  in  the  MNN  rule,  we  may 
modify  the  rule  in  the  following  way.  If  the  left  nearest  neigh- 
bor and  the  right  nearest  neighbor  are  not  from  the  same 

population,  consider  the  next  smaller  and  the  next  larger  observa- 
tions and  denote  the  new  left  neighbor  by  Ug  and  the  new  right 
neighbor  by  Vg.  Then  classify  Z into  fT^  (or  TTg)  if  both 
Ug  and  Vg  are  X (or  Y)  observations;  if  Ug  and  Vg  are 
not  from  the  same  population,  classify  Z into  TT^  and  Hg  with 
probability  \ and  h,  respectively.  This  will  be  called  the 
two-s tage  MNN  rule.  When  Ug  and  Vg 


are  not  from  the  same 


. 26  . 


0 

u 


f! 

(J 


population,  we  may  consider  the  next  smaller  and  the  next  larger 
observations  and  classify  according  to  the  new  left  and  right 
neighbors  and  as  above.  This  defines  the  three-stage 

MNN  rule.  We  shall  derive  the  asymptotic  PMC's  of  the  two- 
stage,  and  the  three-stage  MNN  rules.  When  training  samples  are 
drawn  from  a population  which  is  a mixture  of  TI^  and  TTg  we 
derive  the  asymptotic  risks  of  the  two-stage  and  the  three -stage 
MNN  rules  and  extend  this  to  obtain  the  asymptotic  risk  of  the 
K-stage  MNN  rule.  It  is  shown  that  this  multi-stage  MNN 
rule  reduces  not  only  the  probability  of  randomization  but  also 
the  asymptotic  risk. 

In  section  2.4  we  define  the  rank-analogue  of  the  Kn-nearest 
neighbor  (Kn-NN)  rule.  The  K^-NN  rule  was  first  introduced  and 
studied  by  Fix  and  Hodges  (l95l)>  and  later  modified  by  Cover 
(1968).  The  modified  rule  can  be  described  as  follows.  Let 


M be  the  number  of  observations  in  the  pooled  training  sample 
ni 

from  the  population  Tf^  that  belong  to  the  kR  nearest  neighbors 
(with  respect  to  some  distance  measure)  of  Z.  Then  the  K -NN 


rule  decides  Z as  it., , if  M » max  M . We  propose  a 

1 "i  J-1,2  "j 

"Modified  Kn-Nearest  Neighbor"  (MK^-NN)  rule,  which  uses  the 


as  TT. » if  M 
1 n. 


We  propose  a 


basic  idea  of  the  Kn~NN  rule  but  it  is  expressed  only  in  terms  of 


the  ranks  of  the  observations.  The  rule  is  given  below. 


27  - 


Let 

Un 

be  the 

k . th 
n,l 

neares  t 

observation 

to  Z 

from 

the 

left  and 

V 

be  the 

k _th 

neares t 

observation 

to  Z 

from 

the 

n 

n,2 

right  in 

the  pooled 

training  sample. 

When  U (or  V ) 
n n 

is  not 

defined  as  described  above,  we  define  it  to  be  the  smallest  (or 

the  largest  ) observation  in  the  pooled  sample  (including  z). 

Then  the  MK  -NN  rule  is  defined  as  follows: 
n 

(i)  If  there  are  more  X (or  Y)  observations  in  the  closed 
interval  [Uq,  V^],  classify  Z into  tT^  (or  TT^) . 

(ii)  If  there  are  equal  numbers  of  X observations  and  Y 

observations  in  (Un,  Vn],  classify  Z into  Tf^  and  TTg  with 

probability  ^ and  respectively. 

We  shall  derive  the  asymptotic  PMC  of  the  MK  -NN  rule  when 

n 

Kn  i-*"  811,1  kn  **  n " nintn^.ng)-**.  When  training  sam- 

ples are  drawn  from  a population  which  is  a mixture  of  tT^  and 
Tfg,  the  asymptotic  risk  of  the  MK^-NN  rule  turns  out  to  be 
exactly  the  Bayes  risk. 

The  K -NN  rule  was  obtained  using  the  K -NN  estimates  of 
n n 

the  density  functions  as  suggested  by  Fix  and  Hodges  (1951)  and 
Loftsgaarden  and  Quesenberry  (1965).  However,  it  may  be  noted 
that  the  density  functions  can't  be  estimated  using  the  ranked 
observations  only. 
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2.1  Asymptotic  values  of  the  conditional  PMC  of  the  WN  rule. 

Let  the  c.d.f.'s  of  and  Y^  be  F^  and  respec- 
tively. We  shall  assume  that  and  are  the  p.d.f.’s 

corresponding  to  F^  and  F^t  respectively,  with  respect  to 
Lebesgue  measure.  Denote  by  < X^j  < ...  < X^n  y 

Y(i)  < Y^2j  < ...  < Y^n  j the  order  statistics  of  (X^,  ...»  Xq  ) 

and  (Y..,  ...,  Y ),  respectively. 

2 

Let  P (z)  and  Q (z)  be  the  conditional  probabili- 

nlfn2  Vn2 

ties  that  the  MNN  rule  classifies  the  observation  Z into  TT^ 
and  TT  , respectively,  given  Z ■ z.  Note  that  P (*)  is 

2 V°2 

the  conditional  PCC  and  Q (*)  is  the  conditional  PMC  when 

nl,tt2 

Z — F. , given  Z ■ z.  We  can  write 


(2.1)  P (z)  - EP  t(z) 

V 2 i-1  V V1 


where 


<2-2>  - Pr‘Z  S X(l)  < Y(l)l 


Z - z). 


(2'3)  \.n2,2(z)  - Pr(z  > x(n1)  > Y(»2)IZ  ‘ *>• 
(2.4)  pni>n  “ Pr(x(t)  5 2 5 x(i+i)  foF  some 


Y(j)  ^X(i)’X(i+l)^  for  every 
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(2.5)  a ^(«)  * ^{X^j  < z < Y(j)»  for  so,,,e  1 «nd  J but  no 

other  observations  fall  in  [X^j,  Y(j)Jl2“8?  + 

*r(Y(j)  < Z < for  sons  i and  j but  no  other  observa- 
tions fall  in  X^j].|z»z). 


Similarly,  we  can  write 


(2‘6)  Qn  „ (*^  * n i^’ 

1’  2 i=l  W1 


where 


Q„  „ 4(z)  is  obtained  from  P . (z)  by 

“l*^*1  nl,n2*1 


inter- 


changing X and  Y.  Note  that  Q A?) 

nj_  »n2 


- pv»2a<‘>- 


To  obtain  asymptotic  expressions,  we  shall  assume  that 
0 < \ < »,  where 


(2.7)  X ■ lim  n2/nj. 

« 

The  cases  X = 0 and  X **  » can  be  handled  easily.  We  shall  now 

obtain  the  limiting  values  of  P .(*)  and  Q . (*). 

nl,n2*1  nl,n2’1' 


Lenina  2.1. 


Tnvn2t2^Z 


(i)  Either  FjCz)  >0  or  Fg(z)  > 0 implies 

)-»0  and  Qvtl2>1(*)-0  as  n^  - 

(ii)  Either  Fx(z)  < 1 or  F2(z)  < 1 implies 

)->0  and  Qni>n2>2(z)  - 0 as  n^n,,  •*  ■>. 
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mm 


Proof. 

Pvv 

line. 


1 


It  is  sufficient  to  prove  the  result  only  for 
(s)  since  the  other  results  follow  along  the  slsillar 


(2,8)  PVVl(0  ' Pr(z  5 XW  < r(i)|z"’ 

■ J*  (l-Fj(x))  2Hj(1-Fj(x))  1dF1(x) 
z 

5 (1-^2^*^  1 **  0 U nl’n2  ** 


Remark.  The  above  assumptions  hold  a.e.  if  either  Z~F^  or 


Next  we  shall  derive  the  limiting  value  of  P (*). 

»l»n2>3 

Note  that 


(2.9)  P_  _ .(*) 
ttl *n2  * ^ 


n (l-(F2(y)-F2(x)))''Si>1(n1-l)(l-(F1(y)-F1(x)))ni 


n,-2 


dP^yJdF^x) 

For  fixed  z and  0 < F^z)  <1,  o < F2(a)  < 1 define 

(2.10)  H^y-z)  = (F1(y)-F1(z))/(1-F1(z))  for  y > z, 

(2.11)  Hg(z-x)  = (F1(z)-F1(x))/F1(z)  for  x < z, 

I 

(2.12)  K^y-z)  = (F2(y)-F2(z))/(1-F2(z))  for  y > z. 


- 31  - 


(2.13)  KgU-x)  ■ (F2(*)-F2(x))/Fg(*)  for  x < *. 

Let  u * y-z  and  v - z-x  for  x < z < y.  Then  we  may  write 

<2.1»0  P«l.»fe.3(B>  ’ Jj"t1“(P2(y)-F2(*))“(F2(*)'P2(x)))n2  * 

n1(»l”l){1-(Fl(y)-Fl(*))-(Fl(2)-Fl(x))}  1 dF1(y)dFl(x) 

«J\f  (l-(l-F2(z))K1(y-z)-F2(z)K2(z-x)}n2n1(n1-l)  x 

z 

n.  -2 

(l-(l-F1(z)H1(y-z)-F1(z)H2(z-x)}  dF^yJdF^x) 

- JoJ*oCl-(l-F2(z))K1(u)-F2(z)K2(v))n2i|1(n1-l)  x 

(l-fl^jCzJJ^CuJ^CzjHgCv)}  1 2(l-F1(e))F1(z)dH1(u)dHg(v). 
Let 

(2.15)  0 - fgUJ/fj^z). 

We  shall  show  that 


(2.16)  V2'3(t) 


4 


J I (l-0H1(u)(l-F1(*))-0H2(v)F1(*))n2nl(n1-l){l-(l-F1(z))x 

In.  -2 

H1(u)-F1(z)H2(v))  (l-F1(z))F1(z)dH1(u)dH2(v)  if  0 < 1 

J*oJ>o(l-(l-F2(z))K1(u)-F2(z)K2(v)]n2n1(n1-l){l-(l/0)  x 

(l-F^zJ^CuJ-Cl/SjFgCz^v)}  1 2(l-F2(z))F2(z)dK1(u) 
dKgCv)  if  0 > 1 

Next  we  shall  show  that 

P*  (z)  ->  (1+X0)’2 
nl,n2*3 

For  the  above  results  we  need  to  assume  that  z is  a continuous 
point  of  both  f^  and  and  f^(z)  > 0,  f2(z)  > 0.  Note  that 

P*  >n  ^(z)  is  obtained  from  P^  ^ ^(z)  after  replacing  K^(u) 

and  ^(v)  in  the  integrand  by  ©(l-Fj^zJjH^uJ/Cl-FgCz))  and 
0Fj(z)Hg(v)/F2(z),  respectively,  when  0 < 1.  When  0 > 1, 


VV3 


(z)  is  obtained  from  P 


VV3 


(z)  after  replacing 


Hj(u)  and  H2(v)  in  the  integrand  by  (l/0)(l-F2(z))K1(u)/(l-F1(z)) 
and  (1/Q)F2(z)K2(v)/F1(z),  respectively. 


We  shall  prove  that  each  of  P 


n^»n2»3 


(z)  and  P* 


VV3 


is  asymptotically  equivalent  to  the  corresponding  integral  when 
the  domain  of  integration  [0,»)  x [0,«)  is  replaced  by  [0,6]  X 
[0,6]  for  sufficiently  small  6 > 0.  Moreover,  in  this  domain 
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KjM/HjOO  i 9(i-f1(2))/(i-f2(2)) 

and 

Kg(u)/H2(u)  i 9F1(Z)/F2(z). 

Let  iw  now  prove  the  results  described  above. 

Lemma  2.2.  If  z is  a continuous  point  of  both  f.  and  f 

r 12 

and  f^(z)  > 0,  fg(z)  > 0,  then  for  sufficiently  small  6 > 0 
we  have 

6 6 n 

(2.17)  lim  [P  „ ,(*)-/  S (1-(1-F  (z))K  (u)-F  (z)K  (v)}  2* 

nl,n2"’“  1 2°  00  2 2 

n^-1)  (l-(l-F1(z))H1(u)-F1(z)H2(v)}ni  ^l-F^z^F^zJy 

dH1(u)dH2(v)]  - 0. 

Proof . Since  and  Hg  are  non-decreasing  functions  in  u 

and  v we  have 

J*  / (l-Cl-F^x))^).*  (z)Kg(v))  2n1(n1-l)(l-(l-F1(z))  x 
0 6 

n.-2 

^(uJ-F^zjHgCv))  (l-F1(z))F1(z)dH1(u)dH2(v) 

n i-2 

- nl^nl“1^1"^1"Fl^*^Hl^®^  "*  0 as 

because  l-(l-F^(z))H^(6)  < 1 for  sufficiently  small  6 > 0. 

CO  j OJ  00 

The  other  two  integrals  J*  j*  and  J J*  can  be  similarly  proved 

6 0 6 6 

to  be  asymptotically  zero,  and  the  proof  is  complete. 


Remark . Note  that  only  -» » Is  required  to  obtain  the 


desired  result. 

Before  commencing  the  next  lemma,  we  need  to  develop  some 
useful  facts  . 

Using  the  definition  of  a density  at  its  point  of  continuity, 
we  get 

(2.18)  lim  K (u)/u  = lim{(F  (z+u)-F  (z) )/u) / (l-F  (z)) 

u-K)  u-X)  d d 

= f2(z)/(l-F2(z)), 

(2.19)  lim  H^(u)/u  = lim  {(F. (z+u)-F^(z))/u)/(l-F- (z)) 

u-O  u-K) 

a f1(z)/(l-F1(z)) 

provided  f^(z)  > 0 and  f2(z)  > 0. 

(2.18)  and  (2.19)  entail 

(2.20)  lim  K (u)/H  (u)  = 0(1-F.(z))/(1-Fo(z)). 

u-*0  ^ 


Similarly, 


0 

u 


(2.21)  lim  K (v)/H  (v)  = 0F1(z)/F  (z) 


vO 


If  we  write 


(2.22)  K^u)  = 0(1-F1(z))H1(u)/(1-F2(z))  + Rj(u)/(1-F2(«))( 


Ate Mai 
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(2.23)  K2(v)  = 0F1(Z)H2(v)/F2(z)  + R2(v)/F2(z) 

Then  R^(u)  and  Rg(v)  have  Che  following  property: 
lira  R,(u)/H  (u)  = lira  R (v)/H  (v)  = 0, 


which  is  equivalent  to  that  for  every  e > 0,  there  exists  a 
6 > 0 such  that 


(2.24)  (r^Cu) I < eH^(u),  Jr2(v)|  < eH2(v)  whenever  |u|  < 6, 

Jv)  < 6. 


Lemma  2.3.  Suppose  z is  a continuous  point  of  both  f^  and 
f2  and  f^z)  > 0,  f2(z)  > 0.  If  0 < X < «,  then 

lim  [P  .(z)-P*  n ,(z)l«0 

ni,n2-»  VV3  W3 

Proof.  We  shall  only  prove  for  0 < 1 since  for  the  case  0 > 1 

can  be  similarly  proved  just  by  switching  the  roles  of  H^Cu), 

Hg(v)  with  K^u),  K2(v)  , respectively. 

Obviously  Lenina  2.2  is  also  true  for  ?*  _(z).  There - 

nl’n2’3 

fore  we  need  only  to  prove  that  for  sufficiently  small  6 > 0 

6 6 n 

lim  [J  J {1-(1-F  (z))Kj(u)-F  (z)K  (v))  2n  (n1-l)(l-(l-F 

nl,n2"*"  00  * * 11  1 

(*))Hj(u)-F1(z)H^(v))  1 (X-F1(z))F1(z)dH1(u)dH2(v)- 
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{l-(l-F1(*))H1(u)-P1(«)H2(v))ni  (l-F^z))  F^aJx 

dH1(u)dH2(v)}]=  0. 

Now  using  (2.22)  and  (2.23),  we  can  write 
6 6 n 

(2.25)  SJjil-a-V^zVK^u)- F2(z)K2(v)}  2n1(n1-l)(l-Cl-F1(«))x 

n,  -2 

HjCuJ-FjCajHgCv)}  (l-F1(*))F1(*)dH1(u)dH2(v) 

6 6 n 
= Jj  {l-t9(l-F  (zJKCuJ+R  (u)3-[eF1(2)H_(v)-Ap(v)]}  2 X 

00  *■  d d 

^l-F^z)  JdH^u) 

dH2(v) 

From  the  mean-value  theorem,  then  for  some  0 < y < 1, 

0 < Y2  < 1,  (2.25)  is 

6 6 n 

(2.26)  J*  J*  tl'0(l’F1(*))H1(u)-9F1(*)!^(V))  2n1(n1-l){l-(l-F1(«))x 

^(uJ-FjCzjHg^)}  1 2(l-F1(z))F1(z)dH  (u)dH  (v)-jV 

2 0 0 

R2(v)n2(1-0(l-Fl(2))Hl(u)-0F1(z)H2(v)-Y2R2(v)}n2  X 

n1(n1-l)(l-(l-F1(z))H1(u)-F1(z)H2(v)}ni‘2(l-F1(z))F1(z)x 
6 6 

dH1(u)dH2(v)-J^R1(u)n2(l-0(l-F1(z))H1(u)-Y1R1(u)-6F1(z)X 

h2(v)-R2(v)}  2 n1(n1-l){l-(l-F1(z))H1(u)-F1(z)H2(v)}  1 \ 

(l-F1(z))F1(z)dU1(u)dH  (v) 


The  lemma  is  proved  if  we  can  show  that  in  (2.26)  the  second  term 
and  the  third  are  asymptotically  zero.  Denote  the  second  term  by 


a (z)  and  the  third  by  b (z),  then 

Vn2  Vn2 

| e n U)|  <J  J*  |Rp(v)|n  n (n  -l){l-(l-F1(z))H  (u)- 

l’  2 0 0 ^ d 1 A 

n.  -2 

F1(z)H2(v))  (l-F1(z))F1(z)dH1(u)dH2(v) 


From  (2.24)  for  every  e > 0,  we  can  choose  sufficiently  small 


6 > 0 such  that 

Y 
0 0 


(2.27)  |a  (z)|  < e J*  J H (v)n  n1(n,-l){l-(l-F1(z)H1(u) 
nl»  2 0 0 d 1 1 

n -2  ® 00 

-FjUK^v)}  1 (l-F1(z))F1(z)dH1(u)dHg(v)  < c J J* ' H2(v)x 

n.-2 

n2n1(n1-l){l-(l-F1(z))H1(u)-F1(z)H2(v))  (l-F^z)  ^(z)* 


dH^uJdH^v) 


11  nn -2 

= e J*  / yn2n1(n1-l){l-(l-F1(z))x-F1(z)y)  (l-F^z)  )Fj.(z) 


dxdy 


Considering  the  following  transformation 
s = (l-F1(z))x  + Fx(z)y 
t = F^(z)y 


then  we  have 


■ 
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Case  I.  X0  < 1. 


?n  n = / “iK-Ds  ( S(  ^)(-9)ksk}(l-s)  1 ds 

W3  0 1 1 R=0  k 

n2  n . 

= n (n  -1)  £(  ^)(-er(k+l):(n?-l)!/(n1+k)! 
k=0  2 

= £ [n  ’n_ !(k+l)/(n  -k):(n1+k)I](-9)k 
k=0  1 2 2 1 

n2  k 
s S (-1)  v,  _ 


This  is  an  alternating  series,  and 


\+l,n  /vk,n  “ ((n2"k)/(ni+k+1))((k+2)/(k+1))9 
< (n  /n1)((k+2)/(k+l))9 


Since  X9  < 1,  we  can  choose  n^  and  k sufficiently  large  such 

that  v,  is  decreasing  in  k.  Moreover,  u -»  0 as  k-*”. 
k,n2  6 K,ng 

Therefore 


lim  P* 


nX,n2’3 


(z)  » 2 (k+l)(-X9)  = 1/(1+X9)2. 
k=0 


Case  II.  X9  > 1. 


n„  n, -2 


P*  _(z)  = J*  n.Cn.-lJsCl-gs)  2(l-s)  1 ds 

W3  0 1 1 


J n^iylXy/eXl-y)  2(l-y/0)  1 (l/0)dy 


■■ 


u 


l.J 

u 

0 

c 

0 


Then 


Dn  n = 02(Pn  n ^z)_Rn  n ,(*)). 

n2  >J  * n2  * ^ * n2  * 3 


|d“i-'2-3U)I  5 


nl"2 


n -2 


+ E ^(^-DC  2k")(i/s)kJ‘1(i-y)V+1dy 


V2 


k=M 
n„-2 


Since  E n1(n.-l)(  )(l/0)  J*  (l-y)  ^y^+^dy  converges,  we  can 

k=0  1 K 0 


I 


choose  M sufficiently  large  so  that  for  every  r,iven  e > 0,  the 
second  term  of  right  side  of  the  above  inequality  is  dominated  by 
e.  Hence  for  every  e > 0,  there  exists  M such  that 

lim  lDn  n < lim  S n1(n1-l)(  ^ )(l/8)kJ*  (l-y)  2* 

W3  k«0  1 1 k e 

k+l  *1  . n, -2  ,,  n +1 

y dy  + c < limSn](n1-l)(  1 )(l/0)k(l-0)  2 +e  = e 

k=0  k 

Thus  lim  D „ _(z)  = 0 since  e is  arbitrary. 

w3 

C3S6  Ills  XQ  s 1« 

°°  lr 

We  have  lim  P*  „ Az)  = 52  (k+l)(-l)K 

VV3  k=0 

Then  by  the  method  of  Abel  of  summability  (see  Widder  (1961), 

P*  309-313)*  we  get 

00 

lim  P*  „ Az)  = lim  52  (k+l)(-x)k 

nl’  2’3  x-»l"k=0 

a lim  l/(l+x)2  = 1/h  a l/(l+X0)2 
xr»l" 

Now  the  proof  of  Lenina  2.4  is  complete. 

Lemma  2.5.  Under  the  assumptions  of  Lemma  2.3,  we  have 

(2.29)  lim  Pn  _ Az)  = X9/(l+X0)2. 
nl  *n2***  12 
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Proof . Going  through  the  same  arguments  as  we  did  for 
F_  _ ,(z)  (Lemma  2.2  through  Lenina  2.4),  we  have 

A 9 2 9 ^ 

2 .oo  *1 

V,n  .4(z)  “V  / nin2(1"(F2(y)"F2(x)))  2 

12 

n -1 

-F^x)))  (dF^yJdF^xHdF^yJdFgU)) 

“ y/0nln2{1“(l_F2(z))Kl(u)"F2(2)K2(v))n2  ^ 

{l-(l-F1(z)H1(u)-F1(Z)H2(v))ni  1 x 

{(l“F2(z))F1(2)dK1(u)dH;?(v)+(l-F1(z))F2(z)x 

dH^uJdK^v)) 

00  00 

“ 0(n2/n1+l))J'^(n1+l)n1{l-e(l-F1(z))H1(u)- 
n -1 

AFjCzjHgCv))  2 (l-Cl-F^zm^uJ-F^z)  x 

n.-l 

H2(v))  (l-F1(B))F1(z)dH1(u)d^(v) 

“*  0^(l/(l+^0)a)  = X0/(l+X0)2  as  ni»n2  "*  *• 

Combining  all  the  previous  results,  we  have  the  following 
theorem. 


Theorem  2.1.  Suppose  z is  a continuous  point  of  both  f^  and 
f2  and  fjU)  > 0,  f2(z)  >0.  If  0 < X < »,  then 


I 
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>.30)  lim  P (z)  * f1(z)/(f,(z)+Xf  (z)), 

nvn2-*o  nrn2  1 1 2 


lim  n ^ = *f9(*V(fi(*)+*f2(*))* 
n,  1’  2 d 


Proof . From  Lemma  2.1,  Lenma  2.4,  and  Lenina  2.5 » we  have 


T 

11m  P (z)  = 11m  Zj  P .(; 

VV  1=1  VV1 


i/(i+xe)2  + xe/(i+xe)a  = i/(i+xe) 


^(zJ/C^CzJ+xfgCz)) 


Let  TT 


n^  »Og,l 


and  TT 


n!,n2,2 


be  the  PMC  as  Z coming  from 


TT^  or  TJ^,  respectively.  If  Pr(f^(z)  > 0,  f^  Is  continuous  at 


z|TTj)  ■ 1,  1=1,2.,  J«l,2.,  the  the  followings  are  immediate  con- 


sequences of  Theorem  2.1. 


(2.31)  lim  TT„  >n  t = J*[Xf  (z)/(f1(z)+Xf2(*)]f1(*)dz 

nl»n2"*"  l*  2* 

= J‘t\f1(*)f2(z)/(f1(z)+Xf2(z)))dz 


(2.32)  lim  TT  p » J*[f,  (*)f  (z)/(f,(z)+Xf  (z))]dz 
n!  * n2**>  1 2’ 


Suppose  the  training  samples  are  drawn  from  a population  which  is 
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a mixture  of  TT^  and  TT^  in  the  proportion  ^ and  t Then 
X *s  and  the  asymptotic  risk  of  the  MNN  rule  (assuming  cost 

from  misclassif ication  is  l)  is 


(2.33)  R - C1J‘[(?2/?1)f1(z)£2(2)/(f1(z)+(?2/?1)f2(z))]dz 
+ ?2J*(f1(z)f2(z)/(f1(2)+(52^1)f2(z))]dz 


“ 2j’[?1?2f1(z)f2(z)/(51f1(z)+?2f2(z))]dz 
< 2j‘rain(l-1£1(z),  52f2(z)}dz 


where  R*  is  the  Bayes  risk  with  respect  to  to  prior  probabili- 
ties 5^  and  ?2;  namely.  the  asymptotic  probability  of  error  of 
the  MNN  rule  is  bounded  above  by  twice  of  the  Bayes  probability 
of  error. 


2.2  An  alternative  approach  to  obtain  the  asymptotic  conditional 


PMC  of  the  MNN  rule. 


Let  U be  the  left  nearest  neighbor  of  Z and  V the  right 
nearest  neighbor  of  Z.  Then  the  conditional  probability  of 
classifying  Z into  TTj,  given  Z » z,  U » u,  V « v,  is 

(2-34)  P (z.u.v)  - A(n1,n2)/B(n1,n2), 


where 
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AO^.ng)  - C1(n1,n2)-iC3(n1,n2)+C5(n1,n2)+|C7(n1,n2), 

B(n,  ,n  ) « icAn,,n  ) 

* ial  1 ^ 

and 

CjOvng)  = n^l-F^v)]”1  1[l-F2(v)]n2f1(v) 

C2(nrn2)  = n2[l-F2(v)]n2  1[l-F1(v)]I'1f2(v) 

C3(ni,n2)  = n1[F1(u)]ni’1[F2(u)]%1(u) 

Vnl*n2)  a n2[F2(u)]  2 [F^u)]"1^) 

C5(nlfn2)  = n1(n1-l)[X-(F1(v)-F1(u))]  1 [1-(F2(v)-F2(u)) ] 2X 

c6(Vn2)  » n2(n2-l)[l-(F2(v)-F2(u))]n2  ^-(F^-F^u))^ 
f2(«)f2(v) 

c7(ni.n2)  - ^“gtl-CFiCvJ-FjCu))]  1 [1-(F2(v)-F2(u)) ] 2 x 

[£l(v)f2(u)+fi(u)£2(v)], 

here  ^(n^n,,),  C3(n1,n2),  ^(n^n^,  C (i^.Og), 

Cg(nj,n2),  and  C7(nlfn2)  divided  by  the  conditional  joint  density 
of  U and  V given  Z » z are  the  conditional  probabilities 


for 


of  the  events  (Z  < X^  < (Z  < Y^)  < x(i))» 

<2^lt(n1)>  VK  ‘2iY(na)>X("l)’’  tXW  - 2 5 VO’ 

some  and  no  Y^'s  fall  in  [X^j,  X(i+i)^» 

{Yqj  < Z < Y(j+i)»  for  some  j=l>  • . . >n2~l5  and  no  X^s  fall  in 
[Y(j)»  (x(i)  5 Z < ^(j)’  ^or  some  *•  and  j and  no  other 

observations  fall  in  [X^j,  or  Y(j)  5 2 < X^  for 

some  i and  j and  no  other  observations  fall  in  tY(j)»  x(i)^» 
respectively,  given  Z = z , U = u,  and  V = v. 

We  begin  with  the  following  lemma. 


Lemma  2.6  Either  f^  is  continuous  at  z and  f^(z)  > 0 or 
is  continuous  at  z and  fg(z)  >0  implies  that  ✓ U and  V con- 
verge to  z in  probability  as  n^,n 


-»  00  . 


Proof.  By  symmetry,  it  suffices  to  show  that  U converges  to  z 
in  probability. 

For  every  sufficiently  small  e > 0 

Pr(Z-U  > e | Z=z } =>  Pr{U  < Z-c|z=z} 

- (l-(F1(z)-F1(z-e)))  1{l-(F2(z)-F2(z-e))}n2  -*  0 

as  nj,n2  -*  «*  since  either  l-jF^zJ-Fj^z-e ))  <1  or 
l-(F2(z)-F2(z-e))  < 1. 

An  alternative  proof  of  Theorem  2.1; 

First  we  can  easily  see  that  C1(n1,n2),  ^(n^ng),  C (n^n^ 


- ^7  " 


and  converge  to  zero  in  probability  as  n^,n2  “ 

since  0 < F^(z)  <1,  0 < F^(z)  <1,  U and  V converge  to  z 
in  probability  (Lemma  2.6),  and  the  density  functions  are  contin- 
uous. 

Thus 


plimCACn^ngJ/BCn^n^) 

“ Plim(C5(n1,n2)+|C7(n1,n2))/[C5(n1,n2)H-C6(n1,n2) 
-KJT(ni»n2)]. 
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f1(v)+(n2(n^-1  )/n:p[l-(F1(v)-F1(u))]',f2(u)fg(v)+(n£/n1)  x 
[l-(F1(v)-F1(u))][l-(F2(v)-F{,(u))][f1(v)£2(u)+£1(u)fp(v)]) 


Hence,  by  the  same  reasons  as  we  stated  above,  we  have 


(2.37)  plimCAO^.n^/BO^,^))  = (f|(z)+Xf  ^zjf^z)  )/(f2(z)+ 
X2f|(z)+2Xf1(z)fg(z))  = fl{z)/(f1{z)+\f2(z))l 


namely. 


(2.38)  plim  P (z,u,v)  = f.(z)/(f.(z)+Xf  (z)) 

Ii  j j 1 * <- 


Therefore , by  the  Lebesgue  dominated  convergence  theorem,  we 


obtain 


lim  P „ (z)  = lim  t V „ (z ,U,V) 


nl’n2 


W 


Also, 


e lim  P„  „ (z,U,V) 


W 


C[f1(z)/(f1(z)+Xf2(z)] 

f1(z)/(f1(z)+Xf2(z)). 


lim  Q (z)  = lim(l-P  (z))  = Xf  (z)/(f.(z)+Xf  (z)), 
n^»n2  »i»n2  d *■  d 


The  proof  is  now  complete. 
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2*3  The  asymptotic  conditional  PMC  of  the  multi-stage  MNN  rule . 

The  following  lemma  leads  us  to  assume  without  loss  of  gener- 
ality (with  probability  one)  that  the  k left  nearest  neighbors 
Ui.  Vg,  . ..,  Uk  and  the  k right  nearest  neighbors  Vj , V^, 

. are  well-defined  in  the  K-stage  MNN  rule.  Let 

(2.39)  n a min(n1,n2). 

Lemma  2.7 ♦ If  k/n  -»  0 as  n -»  »,  then 

(i)  Pr (There  are  at  least  k observations  to  the  right 
of  2 for  sufficiently  large  n)  = 1, 

(ii)  Pr (There  are  at  least  k observations  to  the  left 
of  2 for  sufficiently  large  n)  = 1. 

Proof ♦ We  shall  prove  (i).  Since  continuity  of  distribution 
functions  is  assumed  and  it  is  known  that  either  Z — or 
Z — F^*  1-8  tlien  true  with  probability  one  that  either 

0 < F j^(z)  <1  or  0 < F2(z)  < 1.  Suppose  0 < Fj^)  < 1,  and 
define 

Wi  " Xi2,»/Xi^  1 " l*  •••»  ni* 

Then  6(w^)  = l-F^(z)  > 0.  By  the  strong  law  of  large 

numbers , we  have 

nl 

Pr((l/n  ) E W.  ->  e(wj  > 0 as  n.  ■+  »)  - 1. 

1 i-1  1 

Now  since  k/n^  *+  0 as  *♦  » and  C(w^)  > 0,  there  exists  an 
integer  N such  that 


k/n^  < for  > N, 


Consequently, 


n. 


Pr(  Sw  > k for  n sufficiently  large)  = 1, 
i=l 


which  completes  the  proof 

,(k) 


Let  P' 


Vn2(2;Ul”‘*,Uk,Vl,‘",Vk)  (°r  QVnn^z;ul“*-’uk‘ 

V1 > • • • »vr^ ) be  the  conditional  probability  of  the  K-stage  MNN 


rule  classifying  Z into  TT^  (or  Tig),  given  Z=z,  l^su 
V 


^=Vj,  for  i = 1,  ...»  k.  Let  P^  ^ (z)  (or  Q^k^  (z))  be  the 

nl’n2  nl’n2 


conditional  probability  of  the  K-stage  MNN  rule  classifying  Z 

>(k) 


into  Tf.  (or  TT^),  given  Z=z.  The  limiting  value  of  (z) 

Vn2 

is  obtained  through  the  limiting  value  of  P^n  (z)  ;uJ , . . . ,uk; 
Vl,*",vk^  by  using  the  following  lemma. 


Lemma  2.8.  Suppose  either  f^  is  continuous  at  z with  f^(z)>0 


or  fg  is  continuous  at  z with  fg(z)>0.  If  k/n-*0  as  n-*» , 


then 


Uj->z,  Vj->z  in  probability  as  n-*°  for  j=l,...,k. 


Pro°f • we  shall  only  prove  that  U^z  in  probability.  Suppose 
f^  is  continuous  at 
ciently  small  G > 0, 


f^  is  continuous  at  z and  f^(z)>0,  then  for  every  suffi- 


Pr  (Z-U,  > e | Z=z) 


< Pr{There  are  at  most  (k-l)  observations  lying  in  the 


interval  (z-e>z).) 


k-1  n 

= E( 


i=0 


i)qi(i-q) 


= PrfW  /n.  < k/n. | Z-z} 


where  is  the  number  of  X observations  lying  in  the 

interval  (z-e,z). 

Since  by  the  law  of  large  numbers , 

cl  *S  • 

Wn  /n^  -*  Fj,  (z)-F^(z-e ) > 0 and 


k/n^  -*  0,  we  immediately  have 

Pr{Z-U^  > e|z=Z}  -*  0 as  n “ 

for  every  sufficiently  small  e > 0,  which  completes  the  proof 


Define 


(2.40)  D1(n1,n2) 


n, -4 


n1(n1-l)(n1-2)(n1-3)[l-(F1(v2)-F1(u2)]  1 [1-(F2(v2) 


n. 


“f2(u2))]  f1(u1)f1(v1)f1(u2)f1(v2)+n1(n1-l)(n1-2)n2  X 


ni  "3 


n -1 


[1-(F1(v2)-F1(u2))]  1 fl-(F2(v2)-F2(u2))]  2 x 

[fl(u1)fl(v1)fi(u2)f2(v2)+fi(ui)fi(vi)f2(u2)fi(v2)+ 

fl^Ul^f2^Vl)fl^U2)fl(v2^+f2^Ul)fl^Vl)fl(u2^fl(v2)^+ 

ni  “2 

n1(n1-l)n2(n2-l)[l-(F1(v2)-F1(u2))  [l-(F2(v2) 

nQ-2 

-f1(u2))3  fl^ul^fl^vl^f2^U2^f2^V2^ 


(2.41)  Dg^.itg) 

“ n2(n2''1)(n2"2^n2~3^1'(Fi(V2)'Fl(U2^  1t1-(F2(vg) 
n0-4 

- f2(u2))]  2 

[i-(f1(v2)-f1(u2))]  1 [i-(f2(v2)-f2(u2))]  2 

(f2(ui)f2(vl^£2^U2^fl^V2^+f2^Ul^f2^Vl^fl^U2^f2^V2^+ 

f2(ui)fi(vi)f2(U2^f2^v2^+fl^Ul^f2^Vl^f2^U2^f2^v2^+ 

n1-2 

niUi“l)(n2)(n2-l)[l-(Fi(vg)-Fi(u2))]  [1-(F2(v2) 

n -1 

-F«(0)]  2 f„(u.  )f„(v.  )f.(Of .(vj, 


MpPH | 


- 53  - 


(2.42)  D (iij* ng) 

ni  ”2 

-ni(n1-1)n2(n2-l)[1-(Fi(v2)-F1(u2))l  [l-(Fg(v2) 

np-2 

-F2(u2))]  [f1(u1)f2(v1)f1(u2)f2(v2)+f1(u1)f2(v1)f2(u2) x 

fl^V2^+f2^Ul^fl^Vl^fl^U2^f2^V2^+f2^Ul^fl^Vl^f2^U2^fl^V2^’ 

where  D^n^ng),  D2(n^,n2),  D^Cn^.tVg)  are  respectively  propor- 
tional to  the  conditional  probabilities  of  classifying  Z into 
Tfj,  classifying  Z into  Tf^,  and  randomization,  given  Z=z, 
Uj«u^,  Vj=v^,  i «=  1,  2.  And  the  configurations  are 
(XXZXX,  or  XXZXY,  or  YXZXX,  or  XXZYX,  or  XYZXX,  or  YXZXY) , 
(YYZYY,  or  YYZYX,  or  XYZYY,  or  YYZXY,  or  YXZYY,  or 
XYZYX},  and  (XXZYY,  or  YXZYX,  or  XYZXY,  or  YYZXX), 
respectively.  Then  using  Lemma  2.7*  we  have 


(2.U3)  (zJU-.U, 

al’n2  1 ' 


2»vi»v2 


- (n1,n2))/(D1(n1,n2)+D2(n1,n  )+D  (^.ng)) 

a.s.  3 J 

In  the  same  manner  as  in  section  2.2,  we  get 

(2.44)  plim  p£2^  (z;u,,u  ,v.,v  ) 

n-»  • nl,n2  1 2 2 

- (f^*)44\f3(z)f2(z)+xaff(t)f|(Z)+2X2f|(*)f|(z))/(fi(z) 

+4Xfj(s)f2(*)+X2f|(z)f|(z)+X^f2(z)+4X3f1(z)f|(*) 

+X2ff(z)fJ(zMX2ff(z)f|(z)) 


- f|(z)(f1(z)+3^f2(2))(f1(*)+^f2(z))/(f1(z)+^f2(z))lf 
« fJ(z)(f1(z)+3Xf2(z))/(f1(z)+Xfg(z))3 
Thus,  by  the  dominated  convergence  theorem,  we  get 

(2.45)  lira  P^2)  (z) 

n-*»  12 

= ff(z)(f1(z)+3Xf2(z))/(f1(z)+Xf2(z))3 

and 

(2.46)  lira  <^2)n  (z) 
n-*9  nl,n2 

■ f|(z)(f2(z)+3X2f1(z))/(f1(z)+Xf2(z))3 

Similarly,  the  asymptotic  conditional  probability  of  randomization 
(or  tie),  given  Z«=z  is 

(2.47)  lim  T^2^  (z) 

wo  nl’n2 

« 4X2f2(z)f|(z)/(f1(z)+Xf2(z))^ 

which  is  exactly  the  square  of  the  asymptotic  conditional  proba- 
bility of  randomization  of  the  MNN  rule.  (Recall  that  the  asymp- 
totic conditional  probability  of  randomization  of  the  MNN  rule 
(see  2.29)  is  Hm(P  _ >,(*)+Qn  n i>(z)  " liffl2Pn  n " 

n-s®  VV4  W4  n-»»  W4 

2Xf1(z)f2(z)/(f1(z)+Xf2(z))a.) 

When  the  training  samples  are  drawn  from  a population  which 


is  a mixture  of  TT^  and  with  prior  probabilities  and 

?2*  ^en  X » ?g^i*  If  ^(z)  > 0,  ^ is  continuous  at 

2|7Tj)  = 1»  i»  j **  1»  2,  then  the  asymptotic  risk  of  this  rule 
is 

(2.48)  R^ 

- J,[§2f2^z)fl(z)(fl(z)+3xf2(2))/(fl (2)+Xf2(z>)3)dz 

+f[?lfl(z)f|(z)(f2(2)+3Xafi(z))/(fi(z)+Xf2(z))3)dz 

■ J't?l?2fl^Z^f2^Z^^lfl^Z^+52f2^Z^1[^lfl^Z^+6?l?2fl^Z^ 

f2(z)+?|f|(z))/(5ifi(z)-H:2f2(z))2]dz 

■ J*t?1?2fi(z)f2(z)/^lfl(z)+?2f2^Z^^  (1+2[2?1^2f !<*)  X 

f2(2)/(51f1(2)+?2f2(z^SJidz 

to) 

Comparing  R'  with  the  asymptotic  risk  R (2.33)  of  the  MNN 
rule,  we  have 

(2.49)  R - R^ 

Xf2(*)+5|f2(z))/(?ifi(z)+?2f2(2))a]dz 

" J’I?l?2fl(*)f2^*)/^l£l^z)+52f2(z))I  [(?1f1(*)-52f2(*))^ 

(?1f1(z)+52f2(*))*]d2  > 0. 

Namely,  the  asymptotic  risk  of  the  two-stage  MNN  rule  is  Improved 
over  that  of  the  MIN  rule  unless  Sj.fj.Cz)  » ^2f2(2) 


D 

U 

0 

0 

0 

I 


Three-stage  MNN  rule. 

We  shall  omit  the  details.  Proceeding  as  before,  we  get 

(2.50)  lim  pf3'  (z) 
nHP»  Vn2 

“ (f^(z)+6Xf^(z)f2(z)+l4\2f^(z)f|(z)+10\3f3(z)f3(z) 
+X^f|(z)fg(z))/(f1(z)+Xf2(z))^, 

(2.51)  Utn  q£3)  (z) 

TH»  ni‘n2 

* (X6f2(z)+6X^f1(z)f|(z)+l4X^f|(z)f2(z)+10X3f3(z)f3(z) 

+X2fJ(z)f|(z))/(f1(z)+Xf2(2))6, 

and  the  asymptotic  conditional  probability  of  randomization  is 

(2.52)  lim  (z) 

n*»  Vn2 

- SX^zJf^zJ/Cf^zHXfgCz))6, 

which  is  the  cube  of  the  corresponding  probability  of  the  MNN  rul 
The  asymptotic  risk  of  the  three-stage  MNN  rule  is 

(2.53) 

- J*f?i52£l(*)f2^^/^lfl^*^2f2^*^ll^lfl^^+6el?2fi^*^ 

f2(z)+lk?3^|f3(z)f|(z)+10?2P3f2(z)f3(z)+?152f1(z)f2(z) 

+?|4(*)+6?l52fl(*)f2(*)+l^l^fl(*)fi(8) 
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+10?l?|fl(*)f|(*)+?l?2fJ(*)f2(*))/(§lfl(*)+C2f2(*))5]d* 

■ J,t5i?2fi(*)f2(*)^(?if1(*)+?2f2(z))3t(?ifi(*)+7?i?2£j(*) 
f2(*)+24?^gf  j(z)f|(z)404c^|f|(*)fg(*)+751?2f1(*)f^(t) 
+?|f|(z))/(?lfl(*)+?2f2(*))5] 

■ J*t5l52fi(2)f2(*)/(5if1(z)+?2f2(z))]  (1+C25l52^1(z)f2(*)/ 

(?1f1(z)+§2f2(z))5][^^f3(z)+7?|g;2f2(z)f2(z)+751?2fi(r) 

f|(z)+53f3(z)]}dz 

■ J[?1?2f1(z)f2(2)/(«;if1(2)+?2f2(z))]  tl+[2?l§2f1(z)f2(z)/ 
(51fl(2)+?2f2(z))5]t(5ifi(*)+?2f2(*))3+^5i§2f1(*)f2(*)  x 
(?1f1(z)+§2f2(z))]dz 

■ J,C?1?2f1(2)f2(*)/(51f1(2)+?2f2(*))K1+2|1?2f1(z)f2(z)/ 
(e1f1(z)+f2f2(2))2+2[2?1?2f1(z)f2(z)/(5;1f1(z)+ 
?2f2(z))a]a)dz 

and 

(2.54)  R^-  R^ 

* J’U1?2fi(*)f  (z)/(?1f1(z)+?gf2(z))](2?1?2f1(z)f2(z)/ 
(?1f1(z)+?2f2(z))a-2[2?152f1(z)f2(z)/(?1f1(z) 
+t2f2(z))]a)<|z 

- /[?1?2f1(*)f2(*)/(?1f1(*)'»?2f2(*))][2?1?2f1(z)f2(z)/ 


D 
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(Si^UtegfgU))2]  [(?1f1(*)-?gfg(*))2/(?1fl(*) 

+?gfg(*))2]d*  > o. 

We  have  computed  the  asymptotic  risk  for  k «*  4 and  studied 
the  results  for  different  k.  It  appears  that  the  asymptotic  risk 
of  the  K-stage  MNN  rule  is 

(2.55)  R(k) 

- J'[5i?2fi(*)f2(z)/(§1f1(2)+52f2(z)]  (£  teSjSgfjUJfgC*)/ 

j“0 

(?xfl(z)+?2f2(*))2lJ+2t2^2fl(z)f2(z)/(?lfl(z) 

+?2f2^z^2)kl)da: 

and 

(2.56) 

" J'l5i52f1(*)f2(*)/(^1f1(*)+«2f2(*))]  [2§152f1(*)f2(z)/ 

+5gfg(*))2]d*  > 0. 

Mow  we  see  that  the  asymptotic  risk  is  reduced  at  every  stage 
unless  S^Cz)  ■ {gfgCz),  »•«•»  a°d  the  rate  is  decreasing,  and 
the  asymptotic  conditional  probability  of  randomization  at  the 
fcth  stage  is 
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(2.57)  lim  T^k)  (z) 

IH»  "l,n2 

= [2Xf1(z)f2(z)/(f1(z)^f2(Z))2]k 

- [2?152f1(z)f2(z)/(51f1(z)-^2f2(z))2]k. 

Suppose  we  are  trying  to  eliminate  randomizatiai  at  all,  then 
the  asymptotic  risk  is  found  from  (2.55)  to  be 

(2.58) 

(z)f2(z)/(?1f1(z)+?2f2(z))]x[(5JLf1(z) 

+?2f2(z))2/(?iff(z)-,t|f2(z^  )dz 

" Jct(§ffi(z))  52f2(z)+  (?|f2(z))?ifi(z)]/(?fff(z) 

+?|f|(z)))<iz 

The  multi-stage  MNN  rule  can  reduce  (or  eliminate)  randomi- 
zation and  reduce  the  asymptotic  risk,  but  unfortunately  the 
Bayes  risk  can  not  be  attained  by  the  rule  asymptotically. 

2.4  The  asymptotic  conditional  PMC  of  the  MK^-WJ  rule. 

We  shall  obtain  the  asymptotic  conditional  PMC  when 

k . /n  -»  0 as  n According  to  Lemma  2.7»  we  can  assume 

n,i 

without  loss  of  generality  (with  probability  one)  that  Uq  (the 
k .th  left  nearest  neighbor  of  z)  and  V (the  k 0th 
right  nearest  neighbor  of  z)  are  well  defined.  To  avoid 


- 6o  - 

randomization,  we  set  k , + k ® 2k  +1. 

n,l  n,2  n 

Define 

(2.59)  h^jjn) 

- 2)<2fcii”f_4)rFi(v)-P1(u)jJ 

[l-(F1(v)-P1(u)J  1 2 J [F2(v)-F2(u)]2k"  1 J 
n -2k  +1+1 

[1-(F2(v)-F2(u))]  2 n fjCuJfjCv)^ 

(2.60)  hgOjn) 

■ “ln2(  ‘j  )(2k°-l-J)[Pl(v)'Pl(u)lJ 

[l-(F1(v)-F1(u))]''1  1 J tF2(»)-F2(u)]2ka'l-J 
n -2k  +J 

[1-(F2(v)-F2(u))]  2 n [^(uJfgCvJ+fjCvJfgCu)], 

(2.61)  h^(j;n) 

. ^(n2-l)(  J)(g^.i.J)tF1(»)^1(a)JJ 

tl-(F1(v)-F1(u))]"1‘3(F2(.)-F2(u)]2kn'1'J 
n -2k  -1+1 

U-(F2(v)-P2(u))]  2 « t2(„)f2(v), 

and 

2kn-l 

(2.62)  a r - S h (j;n) 

• J-k  -1  1 


j=0 


2k  -1 

E h (j;n), 
j=kn  * 


kn-i 

E h (j;n), 
j=0  2 


(2.64) 


2k„-1 


Sh  (j;n), 


E h (j;n), 
j=Q  J 


where  an ^ (or  bn>1),  a^  (or  b^),  aQ>3  (or  b^)  are 

respectively  proportional  to  the  conditional  probabilities  of 

classifying  Z into  TTj  (or  ffg)  when  both  Un  and  VQ  are 

X observations , when  only  one  of  U and  V is  an  X obser- 

n n 

vat  ion,  when  both  U and  V are  Y observations,  given  Z=z, 

n n 

U «u,  V =v. 
n n 

Let  P_  _ (z;u,v)  (or  Q _ . (z;u,v))  be  the  condi- 

“l,n2  nl,n2’n 

tional  probability  that  MKn-NN  rule  classifies  the  observation 
Z into  T1'1  (or  TTg),  given  Z»z,  0n=u,  Vn«v.  Then 


(2.65)  P_  _ k (*;u,v) 
W n 


a.s 


(a  ,+a  _+a  )/(a  t+a  _+a  _+b  ,+b  _+b  ) 

n,l  n,2  n,3  n,l  n,2  n,3  n,l  n,2  n,3 


- 62  - 


Q_  n v (*»u»v)  ~ 1_p„  _ v (*»u*v). 

Wkn  Wkn 


As  before,  we  let  X = lim  n and  0 = fgCzJ/fj^z). 

Lemma  2»9«  Suppose  that  z is  a continuous  point  of  both  f ^ 
and  fg  with  f^(z)  > 0,  f2(z)  > °*  If  k^  -*  » 311,1 
kQ/n  -*  0 as  n and  0 < X < 00 , then 


0 if  X0  > 1 

(2.66)  plim  a ./b  . = , i»l,2,3. 

- — . n» 1 «»* 

» if  xe  < 1 


Proof . We  shall  prove  for  i = 1,  and  X0  < 1 since  others 
are  of  the  same  type  and  can  be  similarly  proved.  Let 
Yn  * kn-l,  and 


(2.67)  Qf(u,v) 


F1(u)-F1(u)  1-(F2(v)-F2(u)) 


?2'V;"F2^UJ 


2v  +1 

Tn  n -2  n 

( J )(2y +l-J)“  (U,T) 


"*n  1 n-2 


s( } )(_  V(“-v) 

J=0  J 2^n+1  J 


Y +1  „ 

Tn  n,  -2  n 


S W<Y 

1=0  Yn  J Yn  1 3 

n. -2  n 


v +i 


2Vn+l.  (ni"2“Vn)J  (n0-Y^-l): 


X*  ^'n-^  '"I"  'n'm  v 2 Tn  7*  j, 

wVJ  1;-"i:s-yJ):  fryyi+j):  « ( 
Z'W1'  S-g-*n):  (wD!  -J 


or3(u,v) 


v>  / n V x ’n'  ' 2 'n  ' -1/  \ 

&V>  ^V^Vi)!  (»?VH)!  • (u'v) 

J) 

By  Lemma  2.8,  ar(u,v)  ->  Q ^ as  n-*0  and  kn/n-»  0 as  n -*  ». 
Suppose  a(u,v)  -+9"1  as  n-+  ».  If  \0  < 1,  considering  u,v  as 
non-stochastic,  there  exists  a constant  c(l  < c < l/\0)  and  a 
positive  integer  N such  that  for  all  n > N,  we  have 


(2.68)  (nr2-Yn):  (n2-Yn“l)!  u 

(n, -2-Y  -j).'  (n  -y  -l+j)I  a ^u,v^  j*’0’1 fy+l) 

x n d n 


> { [ (n^l-Y^  j ) /(n2-Yn~l+ j ) ]cr(u, v) } * 


> (t(n1-2-2Yn)/n2]of(u,v))^ 


Similarly, 

(2.69)  (nr2"^n): 


(n2-Y  -D!  , 

ivvhit  * (u*v) 


5 ([(n!“2-Yn+j)/(n  -y  -j)]  o(u,v))“J 
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< {[(n1-2)/(n2-2Yn)]  «(u,v)}"J 


< C 


-j 


Hence  for  n > N 


, we  have 

Yn+1  2Y  +1  , Yn  2y  +1 


n. 


« t-T-'*  4 u cy  tj.  . 

Ai>  s(“)cJ/S(  ■>  )c-J 
111,1  j=0  VJ  j=l  Yn  J 


Moreover, 


Yn+1  2y  +1 


2yn+i  ^ +1 


* T * 4 T_  w t-T » 

S(  n )<J-C  n S ( “ )cJ 

i-n  4^.,  J 


j=0  Tn 


J -V. 


-V  2y  +1  2Yn+1  2y  +1  J . 2Yn+1"j 

c n(l+c)  « £ < J ><&>  <&> 


2y. 


> [(l+c)/yE]  n (l+c)/2. 


since  c/l+c  > ^ and 


.2y  +1  2Y, 


Yn  2y  +1  Yn  2y  +1  tT  -r*  CT 

S(“  )c-J<S(  “ .)<«2)  " .2  » 

J»1  Yn  J j=l  Yn  J 


Therefore  for  n > N 


(2.7°)  an>i/bn, l - [(l+c)/a/?]2Vnt(l+c)/2]  > [(l+cj^/c]27" 


The  expression  in  the  right  hand  side  of  (2*70)  tends  to  » as  n-*» 


...... ....... ....... 
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Hence  when  X0  < 1 


Vl/bn,l  = 


n-ipo 


The  proof  is  complete. 


Theorem  2.2.  Under  the  assumptions  of  Lemma  2.9,  we  have 

(2.71)  II®  P («)  - lim(l-Q  («)) 

n-*>  nl*n2*n  n-»»  VV  n 


1 if  X0  < 1 


0 if  X0  > 1 


Proof . plim  P (z;u,v) 

n-»»  W n 


l 

plim  (a  +a  +a  )/(a  .+a  0+a  _+b  _+b  +b  ) 

n,l  n,2  n,3  n,l  n,2  n,3  n,l  n,2  n,3 

plim  U ./(a  .+b  .)]f(a  ,+b  . )/(a  ,+a  +a 

n-»»  i=l  n,i  n,i  n»i  n,i  n,i'' v n,l  an,2  n,; 

+b  . +b  +b  ) ] 
n.l  n,2  n,3/J 


From  Lemma  2.9»  we  have 


plim  a /(a  .+b  . ) 

L-  ».i  n,i  n,i' 


IHPO 


1 if  X0  < 1 

o if  xe  > i 


1=1, 2, 3. 


Therefore, 
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0 


plim  Pn  n k (z*u*v)  = 

n-*>  VV  n 


1 if  X9  < 1 
0 if  \0  > 1. 


Thus,  by  the  Lebesque  dominated  convergence  theorem,  we  get 

i if  xe  < i 

plim  P , (z)  *s 

nl’n2,kn  0 if  X0  > 1. 


In  order  to  apply  the  result  we  need  to  assume  that 
X0^=  1 a.e.  Furthermore,  if  the  training  samples  are  drawn  from 
a population  which  is  a mixture  of  Tf^  and  TTg  in  the  propor- 
tion and  5 , (\  = |2^1^  then  the  asymptotic  risk  of  the 

MK  -NN  rule  is 
n 


r-  J 


> 5lfl] 


;1£1(z)dz+  J1 


'5a£2  < 2 


j 

t 


I 


J minf^j^fjCz),  5 f (z))dz  = R*,  the  Bayes  risk. 


D 

D 


CHAPTER  3 


ASYMPTOTIC  PMC  WITH  RATE  OF  SOME  SPECIFIC 
RULES  BASED  ON  U-STATISTICS 

3.0  Introduction 

Consider  a random  variable  X which  is  distributed  as 
in  the  population  TT^(i=0,l,2).  The  problem  is  to  decide  between 
Fq=F^  and  FQ= Fg  when  it  is  known  that  F^  and  F^  are 
different. 

Let  X^  •••»  ) be  n^  independent  observations 

on  X from  the  population  TTQ,  X^  “^ll’  •••*  xin  ) *>e  ni 

independent  observations  on  X from  the  population  TT^,  and 
“(*21  ’ •••»  X2a  ^ *>e  n2  independent  observations  on  X 

from  the  population  TTg. 

Define  a function  C as 

1 if  u > 0 

(3.1)  C(u)  = 

0 if  u < 0 

The  Wllcoxon  statistics  WQ1,  W^,  W12  are  then  defined  as 


follows 


(3-2)  w°i  ■ vT  C(x--K^ 


- - o 


wog  = frV  s c(xoi*x2k) 

02  0 2 l<i<nQ  U1  2lc 

l<k<h 
2 


W12“^V  S C(XirX2k) 

2 12  l^j^  J 2K 


15^2 


Das  Gupta  (19#0  considers  a classification  rule  which 

decides  Tf0=TT  (i»l,2)  if  |wm-£|  = min  |w.  -£| . Under  slight 

J-1,2  J 

restriction  on  the  distribution  functions  that  J*FjdF2> 

Hudimoto  (I96U)  also  proposes  a rule  which  is  equivalent  to 
classifying  TTQ  into  if  (Vq^qq-1)  < 0.  (By  symmetry,  if 

J*PgdFi  > £ is  assumed,  decide  rro«TTg  when  (w^+W^-l)  < 0). 
When  it  is  not  certain  that  whether  J'F1dF2  > ^ or  ,fF2dFl  > 
Chanda  and  Lee  (1975) > modifying  Hudimoto's  rule,  suggest  a rule 
which  decides  TTq-TTj  if  (Wj^-IKWqj+W^-I)  > °* 

In  this  chapter,  the  asymptotic  probabilities  of  misclassi- 
fication  of  the  three  rules  mentioned  above  are  obtained  together 
with  the  rate  of  convergence  when  n^  and  ng  approach  infinity 
with  nQ  fixed.  Also  Hudimoto's  idea  is  applied  to  general 
classification  problems.  An  example  of  a two-sided  classification 
problem  which  utilizes  the  Lehmann  statistic  (Lehmann,  I95I ) is 
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which  is  also  a U-statistic  in  X.  and  considering 

*0n  88  fixed*  We  define  a function  h (x. x,  ; 

0 cl,c2  icl 

X21’***,X2c2^^0n0^  by  taklti8  the  conditional  expectation  of 

h(xu,...,Xlm^;X2l,...,X2m^|x0n^)  given  xn x^J 

x21....,x  : 


(3*6)  hCl ,c2(xll xlCl ;x2l* • * * »X2C  > 


CthCx. 


,X1C]L  ,XlCl+l’*  * * ^lmj  ;x2l x2c  • 


X2c2+l*-*»X2m2l5ono)3 
for  ; i=l,2. 


In  particular,  define 


(3.7)  7(Xo„  ) - e(h(  i I*  )) 
o ^no 


(3-8)  ’ V“[h=l-«2(X» X1VX21 X2calion0» 


(3-9)  ^(Son^  ■ ) 
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which  can  be  expressed  as 


where 


(3.10)  N a minfn^jn^) 

Therefore  if  lim  N/nt  exists,  ) can  asymptotically  be 


written  as 


(3.11)  o2^  ) i lAftp3^  ),  M2  . H 


1 n * a 


a function  of  X_  not  depending  on  N, 

0 

Using  the  notation  Introduced  above,  we  now  give  the  follow 
ing  proposition. 

Proposition  3.1  If  11a  HA*t  exists  and  assume  that  with 
probability  one 


(3.12)  e|h(  ; |x^)|a  <- 


(3.13)  ) > 0 «a  N 
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Then 


(3.U0  Pr {IK0)-+Pr (Y (Xq^  )<0}+$Pr(Y(x^  )-0)  m H 


Proof » Conditioning  on  X^  , asymptotic  normality  theorems 
(Hoeffding  (19^8),  Lehmann  (1951))  for  U-statistics  state  that 
(3.15)  Pr{U<0|X0n  ) i #(-*(?„»  )/a(x^„  ))  as  N -*  « 


^0 


'*~0n„ 


Combining  with  (3*ll)»  we  have 


(3^6)  Pr{U<0|^}  A *(-Mf(^)t(,(x0no))  as  N-« 


Hence 


(3.17)  Pr(tKO)  - £Pr (U<o|x~  } 

u 


rfrCK)^)<0)-^r(*(v  ) - 0) 


since  <p(Xq^)  *•«  positive  with  probability  one. 


Proposition  3»2  If  lim  N/n^  exists  and  assume  that 

(3.18)  C|f|3  <-, 

(3.19)  £|f|2r  <"  for  ® positive  r 


(3«20)  for  sufficiently  small  e > 0,  Pr{|Y(XQn  )|<  «}  « 0(e) 


and 


(3.21)  No2^  ) > a > 0 as  N 


Then 


-r/(2r+l) 

(3.22)  Pr{U<0)  » Pr(^(i0n  )<0)+0(N  ) as  N • 


Proof . Conditioning  on  and  following  the  proof  of 

Theorem  3.1  of  Grams  and  Serfling  (1973)  with  (3»2l)  we  have 


(3.23)  PrClKOl^)  - *('T(5on  )/o(*0n0^+N 


-r/(2r+l) 


K(ia.  > 

0 


»(^rr(jU1  ))+n 

0 0 


-r/(2r+l) 


K(Jo.0>  “ 


where  K(X^)n  ) is  a function  (independent  of  H)  depending 


X through  the  3rd  and  2rth  absolute  moments  of 

-<*o 

(h0,l^  ' “d  ^hl,0^  * l£oa  ) 


>>• 

0 

Hence  (3.18),  (3»l9)»  and  (3.23) 
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(3.2k)  Pr{U<0}  » C*(-mY(X.  ))+0(N 

•^0  *^0 


-r/(2r+l) 


) as  N -»  » 


Since  $(-M)  = 0(M  ) for  any  positive  k as  M “ we  have 

for  every  e > 0,  as  N -»  », 

-M“1+e  M-1+e 

(3*25)  Pr(lKO)  = J*^  «(-MYcp)dp(Ycp)+  J*  *(-MYcp)dp(Y<p) 

-M 


* -r/(2r+l) 

+J  *(-MY<p)dp(Yep)40(N  ) 

M'1+C 

-r/(2r+l) 

- Pr(Ycp  < -M  A+  }+0(N  ) 

-1+c  -r/(2r+l) 

- Pr{fcp<0}-Pr{-M  1+e  < Yep  < 0}+0(N  ) 

* -r/(2r+l) 

- Pr(Y(Xr*,  )<0}+0((M_1+e)  )+0(N  ) 


-r/(2r+l) 

Pr(Y(?fn  )<d)+o(n  ) 


because  e is  arbitrary. 


Corollary  3.2  Assume  that  f has  finite  moments  of  all  orders . 
If  (3.20)  and  (3.2l)  hold,  then  for  every  e > 0 

(3.26)  Pr(lK0}  - Pr (Y (X-.  )<0)+0(n“^€  ) as  N -*  • 


3.2  Asymptotic  PMC 's 


■0 


Let  Pjj(D),  Pjj(H),  2jj(c)  be  the  probabilities  of  classify 


ing  TT0  into  Tfj  for  Das  Gupta's  rule,  Hudimoto's  rule,  and 
Chanda  and  Lee’s  rule,  respectively.  Note  that  1-P„(»)  is  the 

if  H~Fl  is  the  P!c  if  5on~F2*  There~ 

fore,  to  study  PMC,  it  is  sufficient  to  study  PN(.). 

If  the  conditions  (3.20),  (3.21)  are  assumed  to  be  satisfied, 
and  since  the  functions  are  all  bounded  (in  fact  between  -1  and  l) 
the  moments  of  all  orders  are  finite.  Moreover,  the  product  of 
U-statistics  is  again  a U-statistic.  Then  from  Corollary  3.2 
we  have,  for  every  « > 0, 

(3-29)  *„(D)  . PrdWoj-JKIw^-il) 

- Ptf("oi-wos,(“oi+“og-1)<01 

' Pr«^  2°(f1(x01)-f2(Xoi))](^  S(Fi(xm)* 

+ oCn"^-*)  as  N -»  w 

(3.30)  P„(H)  » ^(W^+W^-l  < 0} 

1 n°  , 

- pr(^T  iS1(Fl(X0i^F2(X0i))-1<0)  +°(N'*+<)  M N -» 


(3.31)  pN(c)  - ^{(w^Kw^+w^-i)  >0} 
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Pr{jp  £ (F1(X0i)+F2(X0i))-l<0}+0(N"^E ) 


0 i=l 


if  S¥ldF2  > * 
n0 

Pr{^  £ (F^XQ^+FgCx^J-l^+oCN'^6) 


0 i=l 


if  J*FldF2  < * 


as  N •+  “> 


I 


3.3  A general  rule  based  on  U-statistics  and  an  example. 


We  shall  generally  describe  Hudimoto's  idea.  Suppose  we 
have  U-statistics  and  defined  by 

(3.32)  V « iX  2 

0 1 <^j> 


£1(V xoc„  iXie, xie  ) 


m P1 

0 


(3-33)  VVbl  ).  S 

0 2 


15Y.C. . .<y  <n 
i n^-  2 


fl% XCtt  *X2Vl X2Y  > 

1 “b  1 


such  that  for  some  8^  < 02  we  have 


(3.34)  ecvjj*  • Tfl)  » 0l,  t(y2\x^o  < trx)  - e 


(3.35)  eCvJj*  e tr2)  » e^.  e ty  . e 


Then  the  rule  will  be  the  one  that  classifies  TT^, 
*1<V2  and  TT2  if  Vx  > Vg. 


(3.36)  ^ “ ®2~®1*  n a m^n(no,ni,n2^  * 

(3.37)  f (X01, . . . .Xq^Xj^ Xim^  »*2l * * * * ,X2n^ 


" fl^XOl*  * * * ,XOa^JXll*  * * * ,XU^"f  1^*01  * * * ' 


*21 X2m^* 


(3.38)  un  h - vrv2 


(3.39)  U_  - 


lv**\?i 

1<Y,<...<Y_  ^ 
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and 

(3.40)  e(un|x0n  « y - -9,  e(ujx.  . ty  - e 

Consequently,  the  rule  is  simply  classifying  T into  if 

U <0  and  Tf  if  U > 0.  Also  U is  again  a U-statistic. 

Regarding  the  asymptotic  probabilities  of  misclassification 
as  the  sizes  of  the  training  samples  tend  to  infinity,  we  shall 
refer  to  Proposition  3.1,  Proposition  3.2,  or  Corollary  3.2. 

We  now  give  applications  of  the  above  results  to  a specific 
example  based  on  the  Lehmann  statistic  (Lehmann  (l95l)).  This 
example  is  constructed  for  general  two-sided  classification  prob- 
lems. Only  continuity  and  distinctness  of  the  distribution 
functions  are  assumed. 

We  define  the  measure  of  discrepancy  between  two  distribution 

functions  F,  and  as 

1 2 

F +F 

(3.41)  L{Flt?2)  = J’(f1-f2)2  d 

Lehmann  (1951)  proves  the  following: 

Lemma  3.1  * Fg  iff  A^.Fg)  = 0 

Let  X^,  Xg  be  independent  random  variables  with  distribu- 
tion function  F^,  and  let  Y^,  Yg  be  independent  random  vari- 
ables with  distribution  function  Fg.  We  designate  max(X^,Xg) 
as  XjVXg,  and  min(x1»Xg)  as  X^AXg.  When  (x^»x2)  and 
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<V*2  ) are  independent,  Lehmann  (l95l)  proves: 


Lenina  3.2 


(3.42)  PrtXjVXg  < YjAY2  or  Y^  < XjAXg)  = l/3  + 2A(Fi,F2) 
From  (3.42)  we  see  immediately  that 

(3.43)  0 < a(f1,f2)  < 1/3 


Consider  the  statistics  an A Vg 

(3-H)  Vl  ■ ofe 


(3.45)  v0  = — ^ 


g(Xrvv  »xrvv  *Xov  »Xo*.  ) 


2 (“oXga)  'V  V 2V  2V 

^2^2 


where 


(3.4€)  g(X1,X2;Y1,Y2) 


1 if  XjVXg  < ylay2 

or  XjAXg  > YxVY2 
0 otherwise 


Define 
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(3A7)  un  = vrv2 


l~h^ni  g(Xrvv  ,x^  ;x0„  ,x„%i  )) 


^'“Oarg’^^g 


f(xrw«  *Xn»,  »Xia  *Xi  a * 


("0)(”l)("2)  IS-j^  °V  °V  «1  102 

l<Pl<?25n1  ,X  ) 

2Y1  ^2 


Then  from  Lemma  3*1  and  Lemma  3.2  we  have 


(3.48)  eOlJx^  c tij  = -2A,  C(Un|xCtai^  e TTg)  = 2A 


Therefore,  the  rule  is  to  classify  TTQ  into  TT^  or  TTg 

according  to  U < 0 or  U >0.  Note  that  -1  < f < 1.  And 
n n — — — 


for  given  XQ1 , X^,  we  have 


(3.49)  ®[f(xoi,Xoe;Xll,Xi2;X2l’X22^X01’X02^ 


- pH*olv*0B  < liUAXL2  « X01AX02  > X11VX12IX01' V 

- Pt'X01VXO2  < X21»22  « x0l“02  > X21VX22IX01’X021 

- [1-Fl(X01VX02)’2+Fl2(X01AX02>-t1-r2(X01VX02)>2- 


P!<X0l'V 


I 

I 

I 

I 

I 

I 

II 

0 

0 

0 

D 

0 

D 

0 

0 

u 

0 

0 

0 
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Tims  P^,  the  probability  of  classifying  TTQ  into  TT^,  is  found 
from  Corollary  3.2.  to  be 

(3‘50)  p»  * Pt'^ 

(1"F2^xoaivxQa  ))a"Fi(X0a1AX0a2)^ 

< 0}  + o(n‘^+€) 

for  every  ® > 0 as  N •*  » 

3 .4  Closing  Remark 

Sen  (i960),  Hoeffding  (1961),  and  Berk  (1966)  have  the 
following  lemma  on  the  convergence  of  U-statistics . 


Lemma  3«3  As  n -»  »,  Un  converges  almost  surely  to  the  para- 
meter it  estimates  unbiasedly. 

The  strong  consistency  of  the  rules  mentioned  above  is  then 
an  immediate  consequence. 

Utilizing  Hoeff ding's  inequality  for  bounded  U-statistics 
(Hoeffding  (1963),  p.  25),  we  obtain  an  upper  bound  of  PMC  of  the 
rule,  which  is  based  on  the  Lehmann  statistic. 

(3.5I)  PMCj  < e“2[n/2]A£  ^ g 

where  the  subscript  j indicates  that  the  probability  is  calcula- 
ted under  the  assumption  that  TfQ  = TT^ , and  [xj  denotes  the 
largest  Integer  less  than  or  equal  to  x. 


- 82  - 


! 

i 


tial  classification  rules  based  on  U-statistics  with  bounded 
kernels  so  that  the  sampling  will  terminate  with  probability  one 


and  the  PMC's  can  be  made  smaller  than  any  preasBigned  arbitrary 
positive  constant.  We  have  extracted  the  basic  idea  from  the  work 
by  Hoeffding  and  Wolfowitz  (1953)  on  distinguishability  of  sets  of 
distributions.  Later  the  notion  of  distinguishability  was  used  by 
Das  Gupta  and  Kinderman  ( 197*0  in  the  set-up  for  the  classifica- 
tion problems.  Hoeffding  and  Wolfowitz  (1953)  introduced  the 
minimum  distance  test  procedure  and  studied  the  properties  of 
this  test  using  the  available  probability  bounds  on  sample  dis- 
tance function.  We  shall  introduce  the  minimum-U  sequential  rules 
and  prove  some  properties  of  these  rules  by  using  the  available 
probability  Inequality  for  U-statistics. 

In  the  second  part  of  this  chapter  we  shall  consider  some 
sequential  rules  when  Fj^  **  E)  and  F2  “ Np^2’  Fo*“ 

lowing  the  idea  of  Chow  and  Robbins  (1965)  and  Simons  (1968), 
Srivastava  (1973)  proposed  some  sequential  rules  for  the  follow- 
ing two  cases:  (i)  **  ® known  but  E is  unknown,  (li)  Both 

6 and  E are  unknown.  For  the  case  (l)  Srivastava  (1973)  Pro- 
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posed  a sequential  rule  based  on  observations  from  TfQ  and  Tfj 
and,  given  or  he  showed  that  the  PMC's  of  this  rule  tend  to 
values  less  than  or  as  6 3D  ■+  0.  Furthermore  he  modified 
his  rule  so  that  the  prescribed  errors  can  actually  be  achieved. 
He  also  proved  the  asymptotic  efficiency  of  his  rule.  However 
Srivastava's  proof  is  incomplete  and  suffers  from  a technical 
error.  We  shall  present  a more  rigorous  analysis  of  his  rule 
(not  the  modified  one).  Srivastava  also  proved  that  for  his  rule 
in  case  (ii)  the  errors  can  be  controlled  arbitrarily  as 
6 3D  *6  •*  0.  However  his  proof  is  entirely  wrong  and  here  we 
shall  indicate  his  error.  Unfortunately  such  an  optimal  result 
is  not  true  in  this  case. 

4.1  Mlnimum-U  sequential  rules. 

As  in  Chapter  3,  we  want  to  decide  between  and 

Fq^F2,  and  samples  , X^  are  available  from  the 

populations  TT^,  tT^,  TT^»  respectively. 

We  shall  study  the  problem  in  the  following  set-up. 

Consider  U-statistics  defined  by 


(4.1) 


■ 
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for  m < n;  n = min(nQ,  n^,  n^},  where  the  summation  is  over  all 
possible  combinations.  Furthermore,  assume  that 

e<Vllion  ‘ V - «!•  e<V2i5o»  ' ”1)  * 

(4.2)  ° ° 

V ” V9’  e^2liono*  V “ V 

where  0 > 0.  It  is  also  assume  that  h is  bounded;  namely, 
there  exist  d^  and  dg(-°°  <d^<d2<00)  such  that 
dx  < h < d2.  Then  d±  < V1,  Vg  < dg. 

To  illustrate  the  above  set-up,  consider  (Hudimoto,  1964) 

(4.3)  ?!  - (l/n^)  ScfX^-Xy) 

K * £c(x2k-xoi> 

where  c(x)  =»  1 x > 0 

0 x < 0 

Hudimoto  (1964)  showed  that 

e<*\|Xon  «V  » h ^2l5on  *V  “ 

(4.4)  0 0 

e(?ilXono*y  - 4+4,  c(P2|x0no«Tr2)  » i, 

where  A ■ JVjdFg-jt  > 0 (assuming  F^  and  Fg  are  distinct  and 
continuous). 

For  another  illustration,  consider 
-1  -1 

(4.5)  Fj_  » (2  ) (2  ) ^ 8(xCtt1,XC»g;Xlp1,XlPg^ 


n _1  n _1 

(2  ) (g  ) ^ 8(X0»1’X0a2*X2Y1,X2y2^ 


where  g is  defined  in  (3.46).  Then  from  Lemma  3.1  and  Lenina 
3.2,  we  get 

e<FllV,eV  ‘ 1/3’  e(F2l?0n  *V  * l/3+24 

(4.6)  0 0 

e(ril5on0«TT2)  = l/3+2A»  C(r2l?0n0eTr2)  " 1/3 


1/3+2A 


F.+F„ 

g-j  1 2 


where  A = J*(F^-F2)2d 


As  before,  we  write  U as 

n 


(4.7)  uq  = vrv2 

^4  n. 

- f t / 2\  ' 


(«  ) <»  > (.  > Sf% %m;XlPl xiem; 


X2V  * * * * ,X2y  ^ * 

1 m 


where  f(X01,...,XQo;X11 Xlm!Xal x2„)  - h(X01, . .. .X^: 

XU X0m‘X21  ’ * ' * ,X2m^ 

Note  that  -a  < f < a;  a *»  dg-dj  > 0. 

In  Chapter  3,  we  have  considered  the  rule  which  classifies 
TT0  into  TT1  if  0n  < 0 and  into  TTg  if  Un  > 0.  From 
Hoeffding's  inequality  (1963,  p.  25)  for  bounded  U-statistics , 
we  obtain  an  upper  bound  of  PMC,  which  is  given  by 
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(4.8)  expC-fn/mJ^/aa2),  Q > 0. 

We  shall  consider  two  cases:  (i)  qx  is  known  but  0 is 

unknown,  (ii)  Both  0^  and  0 are  unknown. 

4.1.1  Minimum-U  sequential  rule  I:  0^^  known. 

Often  0j  in  (4.2)  is  a known  constant  as  we  have  seen  in 
(4.4)  and  (4.6)  where  0X  - £ and  ^ - 1/3,  respectively. 
Without  loss  of  generality  we  shall  assume  0X  « 0.  Then  we 
define  a sequential  rule  as  follows: 

Pirst  we  choose  a sequence  (<*n)  of  positive  constants  such 

that 

(4.9)  E<*  < p 

i-1  1 

and  a sequence  (C^)  of  positive  numbers  such  that 

(4.10)  limC  -0,  and  0 < C < d_  for  all  n > 1. 

n-*>  n 2 

and  a strictly  Increasing  sequence  (m^}  of  positive  Integers 
such  that 

(4.11)  exp(-[m1/m]c2/2a2)  < for  all  i > 1 and  > m. 

Put 

(4.12)  V^X^)) 

Take  successive  independent  samples  of  sizes 
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m^-mg,  .... 


Continue  sampling  as  long  as  8t  < C^.  Stop  sampling 


as  soon  as  8^  > C^,  and  apply  the  terminal  rule 


0*.13)  9 


i 

1 1£  vi<5o.1iSim1><vi(?o»1i^»1) 


0 otherwise 


We  classify  TTq  into  TT^  or  TTg  according  to  cp  « 1 or  0. 
Hence  the  sample  size  is 

(4.14)  N « mt, 

where  t is  the  first  integer  i for  which  6^  > C^. 

We  shall  denote  this  rule  by  (N,  q>).  Following  the  argu- 
ment of  Hoef fdlng  and  Wolfowitz  (1958) , we  get  the  following 
results . 

Proposition  4.1.  The  rule  (N,  cp)  terminates  with  probability 
one. 

Proof.  It  suffices  to  show  that  Pr{N  < a 1 since 

Pr(N  < can  be  similarly  proved.  * 

(4.15)  = Pr(8l  < C±  for  1 < i < 

< < CjlTT^} 

- Prt-Vg+CCVglTT^)  > e( Vg I TTqWT^ ) -c j I TTq^ ) 

- Pr(-Vg+e  > 0-Cjlrr^} 


Since  Cj  -»  0 and  9 > 0 we  have  0-Cj  > C j for  j suffi- 
ciently large,  find  then,  by  Hoeff ding's  inequality  for 
U-statistics,  the  right  side  of  (4.15)  Is  <exp(-[mj/m]C^/2a2)<i*j 
(see  (4.11)).  By  (4.9),  or^  ■+ 0 as  j -»  «.  Thus  PrliPm^}**) 
as  j which  completes  the  proof. 

Proposition  4.2.  Each  of  the  PMC's  of  the  rule  (N,  <p)  is  less 
than  p. 

Proof . Since  (N,  cp)  terminates  we  can  write 

pmc2  - e(<plVnr2) 

00 

« £ Pr{6j  < Cj  for  j < i,  t>t  > C±t  Vj  < V^TT^} 

< S Pr(v2  > ci|rr0=4T2) 

< 2j  exp(-[mi/m]C|/2a2)  by  Hoeffding's  inequality 

< S«.  <p 

“ i«l 

Similarly, 

phc1  - e(i-9|ir(^r1)  < p. 

Furthermore,  if  we  choose  the  sequence  (m^)  suitably,  the 
SKsaent  generating  function  of  N will  exist. 

Proposition  4.3.  If  the  sequence  (mj)  is  so  chosen  that  (in 


addition  to  (4.1l)) 
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(4.16)  lim  inf  i”*(2m[mi/m]-mi+^)  > 0, 

i-*o 


then  for  every  0 > 0,  there  is  a positive  constant  t(8)  such 
that  ^(expCtNJlTTQsT!^)  < " for  t < t( Q) , 1=1,2. 


Proof.  We  shall  only  prove  that  6(exp(tN) JtTq=*D^)  < °*.  Since 


Cj-»0  as  j ■+  « and  0 > 0,  there  exists  a positive  integer  J 


such  that  0-Cj  > 0/2  for  j > J.  Therefore,  for  all  j > J, 
due  to  (4.15)  and  Hoeffding’s  inequality,  we  have 


(4.17)  Pr(N  > m^TTo^i)  5 exp(-[Oj/m||(0/2)2/2a2) 


exp(-[mj/m]02/8as) 


Now  for  any  real  t. 


w 

C(exp(tN)|TT0-^T1)=  2 exp(tm  )Pr(N«m 
j=l  J J 


) 


< expCtm^H-  2 exp(ttn^+1)  Pr{N  > 


Thus,  from  (4.17)»  6(exp(tN) I ^ °®  ^ the  8e**e8 


S exp(tm.  - )exp(-[m,/m]02/8a2)  converges.  Since  0>O  and  m 
j-1  3 3 


is  a positive  integer,  let  t(0)  » 02/l6n«ia  >0.  If  t < t(0), 
then 


tm^j-tmj/m^/Sa2  < -(02/l6na2)(2[mj/m]-mj+1) 


so  that  the  series  2 exp(tm.  1 )exp(-[m./m]02/8a2)  converges  due 

J-1  J+1  J 


to  (4.16).  The  proof  is  complete. 


i 


1 


a 

0 

1 


1 


n 

o 

D 


LI 


U 

0 

0 

0 

D 

0 

0 
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Remark.  If,  for  a given  p > 0,  we  choose  - p/2^» 

Cj  = dj  here  0 < d < d^,  then 

nij  = 2m(a2/d2)( j2log2+jlog(l/p)).  Therefore  the  conditions  (4.11) 
and  (4.16)  hold,  so  the  moment  generating  function  of  N exists. 

4.1.2  Minimum-U  sequential  rule  II:  0^  unknown. 

Define  the  sequences  (a^) , {C^} , (m^)  as  before  and  put 

(4.18)  At  = |v1(x0Oi:xlmi)-vs(x0mt;xe„i)| 

Take  samples  of  sizes  m^,  m^-m^,  m^-m^ where  m^'s  are 

defined  as  in  (4.1l)  Continue  sampling  as  long  as  A,  < C, . 

Stop  sampling  as  soon  as  A^  > and  apply  the  terminal  rule  <p 
(see  (4.13)).  The  sample  size  is  given  by 

(4.19)  H'  . mfc, 

where  t is  the  first  integer  for  which  A^  > C^.  We  denote  this 
rule  by  (n'  , cp). 

Proposition  4.1'  The  rule  (N* , <p)  terminates  with  probability 
one. 

Proof . We  shall  only  show  Pr{N  < HTTq^Tj)  a 1. 


0 


- --5. " ‘ ' 
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(lv.20)  Pr(N  > mjlTT^}  » Pr{At  > C£  for  1 < i < jlTr^} 

< Pr{Aj  < CjlTT^} 

- <CjlW 

J J J J 

= Pr{vrv2<cr  v15v2|tr0jr1}4Pr{v2-v1<!J.  yvjTT^} 

< Pr(Vx-V2  > OlTT^J^r^-Vg  ^TT^} 

- pr(V1-V2+e  > 0|Tro=TT1}+Pr{V1-V2+0  > e-C^TT^) 

< 2Pr{V1-V2+9  > Cj) 

for  j sufficiently  large.  By  following  the  exactly  same  argu- 
ment as  in  Proposition  4.1,  the  right  side  of  (4.20)  tends  to 
zero  as  j -»  «.  The  proof  is  complete. 

Proposition  4.2 1 Each  of  the  PMC's  of  the  rule  (N1,  cp)  is 
less  than  P. 

Proof . Since  (N',  cp)  terminates  we  can  write 

pmc2  = C(cp|rr0-rr2) 

00 

- £ PrfAj  < Cj  for  J < 1.  > C±t  Vt  < VglVV 

i®l 

< £ > cjTTQ-TTg) 

- £Pr(V2-V1+0  > ^ITTq-IT^ 

» £Pr(V2-V1+0 

< S expC-Cm^/mJCl^a2)  < < p. 


Similarly, 


pmc1  = e(i-cp|Tr0=rr:L)  < p. 

In  exactly  the  same  way,  we  can  prove  that  the  moment 
generating  function  of  N'  exists,  if  the  sequence  (m^)  is 
chosen  suitably. 

4.2.  Sequential  rules  for  classification  into  one  of  two 
multivariate  normal  populations. 

For  convenience,  we  shall  follow  Srivastava's  notations.  The 
problem  is  to  classify  E)  into  one  of  Np(p,^,  E)  and 

V"8'  E) . When  all  the  parameters  are  known  and  a sample  of 
size  n is  taken  from  TTq  the  minimax  rule  is  to  classify 
TTQ  into  TT^  or  tt^  according  as 

(4.21)  XqS  1(ji1-p2)“^(n1+u2)’ S 1(m-1-u2)  < 0, 

where  is  the  sample  mean.  The  two  PMC’s  given  by  e^  and 

eQl , are  equal  and  their  common  value  is  given  by 

(4.22)  e^  » egl  a l-*(fcA>), 

X 1 

where  *(x)  » J (2lT)"®exp(-^y2)dy , 

-00 

D2  = 51  S ^6,  and  6 a 

To  control  probabilities  of  misclasslfication,  Srlvastava 
(1973)  proposed  sequential  rules  in  the  following  two  cases: 


(4.24)  mS  = ^Xij~^in^Xij”Xin^ ' ’ m=2(n-l), 

in  i— u jsi 

where  {X^}  is  a sequence  of  mutually  independent  random 
p-vector  from  l),  i=0,l.  Then  a stopping  variable  N 

is  defined  by 


(4.25)  N = the  smallest  integer  n(>  nQ)  such  that 
n > 8a2/6's"16, 

— m 

where  2n^  > p+2.  When  sampling  is  stopped  at  N=n,  classify 
TTq  into  TT^  or  TTg  according  as 

(4.a6)  (x^-x^O’s;1#  $ 0. 


Define 

(4.27)  *£J  nYln-  &1l} 


0 

0 


i" 

'■  J J 


B 

l 2J 


Then 


(4.28)  6,S'16  «=  ft'(  ^ S*2£)-16 

m m 

» (2^6  J's*"1^^)  = 6*'S*'16*, 
m m 

where  6*  = 27*5.  Note  that  mS*  is  distributed  as  W (m,l). 

m P 

Now  we  shall  obtain  some  asymptotic  properties  of  this 
sequential  rule  as  D -*  0 (or  6*  -*  0).  From  (4.25)  and  (4.28), 

N *=  N(6*)  = the  smallest  n > n^  such  that 

n > 8a2/fi*'s"16* 

— m 

or 

(4.29)  (l/m)6*'6*/6*'(mS*)-16*  < n6*'5*/8a2 

Since  for  6*  ± 0,  5*'  6*/ ft*'  (mS*)”1**  ~ X2 

■ m m-p+l 

(4.30)  (l/m)6*,6*/6*'(mS*)'1fl* -♦  1 a.s.  as  n -*  » 

From  (4.29)  and  Lemma  1.  of  Chow  and  Robbins  (1965)*  we  have  the 
following. 

Lenina  4.1.  (i)  N -»  » a.s.  as  6*  *+  0. 

(ii)  N/(8a2/5*'6*)  ->  1 a.s.  as  6*  -*  0. 

Note  that  the  rule  is  now  studied  in  terms  of  Y^'s  and 
the  a.s.  convergence  as  6*  -*  0 is  meaningful  (contrary  to 
Srlvastava's  development). 
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.1 

Lemma  4.2.  (Asymptotic  efficiency)  lim  eN/(8a2/6*'6*)  = 1 

6*0 

Proof.  It  is  enough  to  show  that  {^*'6*).^, „ is 

0*  0*  ^ 0 

uniformly  integrable.  According  to  a result  of  Bickel  and 

Yahav  (Lemma  3*2,  1968)*  it  is  sufficient  to  prove  that 
00 

25  Sup  Pr{N6*'6*  > k)  < * for  some  e > 0.  Now,  for 
k=l  OC6*'6*<e 

0 < 6*' 6*  < e 


Pr{N6*'6*  > k)  = Pr(N  > k/6*'6*] 

< Pr{N  > k(6*)},  where  k(6*)  » [k/6*'6*] 

< Pr{k(6*)  < 8a2/6*,S|"16*},  where  f = 2(k(6*)-l) 

- Pr{6*,6*/6*,(fS*r16*  > f6*'6*k(6*)/8a2} 

< (6kaJ|/k2(6^)(6*'6*)2).e(X|_p+1)2/fa 

< (64a^/(k-6*'6*)2).(f-jM-l)(f-p+3)/fs 

< 64a4/(k-«)2 


for  c sufficiently  small. 

Hence 

00 

25  Sup  Pr(N6*'6*  > k)  < «o, 
kail  CK6*'6*<e 

which  completes  the  proof. 

Let  eij  = Pr {classifying  TTQ  into  1^=^)  • Then 

Theorem  4.1 . lim  e.  = lim  e . = a. 

6*0  12  6*0  21 
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I 

I 


I 


i 


Proof.  We  ehall  only  prove  for  el2.  Let  M ■ 2(N-l),  we  have 

(4.31)  ei2  - Pr^NteAx^-x^'s*^  < 

- Pr{(N/2)^(XON-X1N)'s;16/(6,S-1S 

< -%(M/2)^6,sjJ16/(6,S^fe  SMl6^lTTo=ni) 

» 1-M[(n/8)^6*,S*'16*/(6*,S*"26*)^] 

= l-C»[a»(N/8a2/6*,fi*))^‘(6*,S*"16*/6*'5*). 

n 

(6*'6*/6*Sg"25*)^] 

Now  for  any  orthogonal  matrix  L we  can  write 

(4.32)  ft*'S*_26*  > ft*L,LS*"’^L,LS^”1L,L6* 

- (L&aJVV^Lft*),  A - LS£~V. 

If  we  choose  L with  first  row  as  6*'/(6*,6*)^,  then 

(4.33)  6*'sg”26*  a 6*’6*(first  row  of  A *)(first  column  of  A *) 
Therefore, 

6**Sg”26*/6*'6*  ■ (first  column  of  A *) (first  column  of 

A*1) 

Since  MA  ~Wp(M,l),  A -*  Ip  a.s.  as  6*  *♦  0 
Hence 

6*’sg"26*/ft*'6*  -*  1 a.s.  as  6*  -»  0. 


Also  we  have  seen  from  (4.30)  and  Lemma  4.1  that 
N/(8a2/fl*'6*)  -♦  1 and  6*  -*  1 a.s.  as  6*  -♦  0. 

Pi 

By  the  dominated  convergence  theorem,  from  (4.31)  we  have 


lim  e1  = l-$(a)  = of. 
6*0  12 


Case  II.  Both  6 and  1C  are  unknown. 


Now  sampling  Is  carried  out  sequentially  from  TT^  as  well. 


(4.34)  tW  = £ 2 (x1rxiJ(xn-5iJ,»  t " 3(n-l), 

c 1=0  j=l  J n n 


where  hX  - Ex  ,.  Then  the  sampling  rule  Is 
2“  J«1  23 


(4.35)  N'  = the  smallest  Integer  n(>  n^)  such  that 


n > 6aa/6  V *6  ; 
— n c n 


where  6 * X,  -X_  . When  the  sampling  is  stopped  at  N' 
n in  2R 

classify  TTq  into  TT^  or  TTg  according  as 

(■*.36)  [V-*<5'ln+*2n»'Wt'1»a  < °- 


a 

) 

0 

Li 

Q 


U 

0 

0 

0 

IJ 

a 

o 


■ 
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(It. 37)  z1j  n2in  ■ ^Zlj  * 

1=0  j=l 

s;  * V\»  - " ^V6* 

Then 

(4.38)  = (6*+&*)'wf1(6*+6*) 

= (n/2) ( 6*+6*) ' ( tW*) _1 ( 6*+6*) ( t/(n/2 ) ) 

- (U/V)(t/(n/2)), 

where  U and  V are  mutually  independent  and 
U — x^((n/2)6*'6*),  V ~x|wp+1  (see  Anderson  Theorem  5.2.2, 

p.  106,  1958). 

Note  that  U/V  is  stochastically  larger  than  X^/V.  There- 
fore, if  we  define 

N*  = the  first  n(>  nQ)  such  that 

n>  6a2/[(x^/V)(t/(n/2))], 

then  N*  is  stochastically  larger  than  N*,  and  N*  is  inde- 
pendent of  6*.  It  is  clear  that  there  is  positive  probability 
that  N*  is  finite.  Hence  it  is  not  true  that 

11b  N'  - « a.s . , 
b*-0 

which  is  the  error  in  Srivaatava's  argument. 
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APPENDIX 

DERIVATIONS  TO  OBTAIN  A TAYLOR  SERIES  EXPANSION  OF  THE 
FUNCTION  ©?(  ) 

Let 

(A.l)  <p2(x,y;r)  - (1/2TT)  (l-r ^yK^^rKy+y2)/^2)  jrj  < | 
00  00 

(A.2)  T(u,v;r)  - J*  J*  q>2(x,y;r)dxdy. 

v u 

Then 


(A. 3)  87*  - -J*" (l/ZTf)  (l-r»)**<i^u“-aruyfy*)/(1-ra)dy  . 

v 

* y-2ruy+u?r  2+u2 ( 1 -r 2 ) ) / ( 1 - r2  ) 

- -J  (1/2TT)  (l-r2)  e dy 

v 

- -;"(ai(i-r2))-i.-*(j'-«)2/(l-c2)  (arj^jy 

V 

■ -V1(«)(l-*1((v-ur)/(l-r2)^)] 


where 


(*.4)  »,<«)  . (ar)-*.-*"2 

U 

(A. 5)  tj^u)  ■ J*  <p1(x)dx 


Similarly, 
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« 


t- 

I 

F 


(A  .6)  »/*  = “91(v)[l-41((u-vr)/(l-r^)^)  ] 

and  for  o < |r|  <1 
00  00 

(A. 7)  dY/dr  = J*  J 3/ Sr (tpp (x , y ; r ) )dxdy 
v u ^ 

00  00 

“ J*  J a2/aitay(92(x,y;r))dxdy 

v u 

83  cp2(u»v;r) 

(a. 8)  a^/au2  = a/au(-tp1(u)[i-s1((v-ur)/(i-r2)^)]) 

■ (1-*1((v-ur)/(l-r2)^))cp1(u)u+91(u)q>1((v-ur)/ 

(l-r2))  (-r/(l-r2)*) 

- ^(uJt-cp^Cv-urJ/U-^AuCl-^CCv-urJ/Cl-r^)^))] 
(A. 9)  ■ «p^(v)[-<p^((u-vr)/(l-r2)^)r /(l-r2)^+v(l-S1((u-vr)/ 

(I-**)))] 

(a. io)  a^/dr2  » a/ar(<p2(u,v;r))  - 37dvau(<p2(v,u;r)) 

■ 3/^v(-<p2(u,vjr)(u-vr)/(l-r2)) 

• <P2(v,v;r)[r+(u-vr)(v-uv)/(l-r2)]/(l-r2) 


a 

o 


0 

0 

0 

0 


0 

n 

o 

o 

0 

[j 

0 


a 

B 
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(a. ii)  a^/avau  = ^(uj^ccv-urj/ci-r^^j/ci-r2)^ 

(A. 12)  d^/ardu  = (pjCu^CCv-urJ/fl-^^Jf-u+vt )/(l-r2)^2 
(A. 13)  a^/SuSv  » «p1(v)cp1((u-vr)/(l-r2)^)/(l-r2)^ 

(A. 14)  a^/BrSv  = cp1(v)<p1((u-vr)/(l-ra)^)(-u+vr)/(l-r2)^^2 
(A. 15)  a^/SuSr  = -«P2(u,v;r )(u-vr)/(l-r2) 


(A. 16)  S^/dvdr  = -tpg(u,v;r)(v-ur)/(l-r2) 

Then  (as  Taylor  Series  expansion)  for  real  a,  b,  and  |p|  < 1 
we  have 

(A.17)  Y(u,v;r)  » ,r(a,b;p)+[-Cp1(a)(l-#1[(b-ap)/(l-p2)i))](u-a) 

+ ( -<?!  (4 ) ( 1 -*  j [ (a-bp ) / ( 1-p  2 )^) ) ] ( v-b  )-Kp2  (a , b ; p )x 
( r-p  ( s ) [ -^pj^  ( (b-ap  ) / ( 1-p  2 )^)p  / ( 1-p  2 )^ 

+«(l-f1((b-ap)/(l-p2)^))](u-a)a+^P1(b^< 
I*<P1((*-bp)/(l-p2)^)p/(l-p2)^+b(l-#1((a-bp)/ 
(1-p2)^)  ) l(v-b)a+itp2(a  ,b  ;p  ) [p+(a-bp  )(b-ap  )/ 
(l-p2)](r-p)2/(l-p2)+^t(p1(.)«p1((b.ap)/(l-p2)*) 
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■KPlOOVjUa-bpJ/fl-p2)^)]  (u-a)(v-b)/(l-p2)^ 
+i[cp1(a)qi1((b-ap)/(l-pa)^)(-a+bp)/(l-p2)3/2- 
cp2(a,b;p)  (a-bp)/(l-p2)](u-a)(r-p)+^[<p1(b) 

cp1((a-bp)/(l-p2)^)(-b+ap)/(l-p2)3/,2-cp2(a,b;p)x 

(b-ap)/(l-p2)](v-b)(r-p )+R 

where  R is  a remainder  term.  Simplifying  (A.17)  we  have 
(A. 18)  Y(u,v;r)  = 'i,(a,b;p)+cp1(a)[l-$1((b-ap)/(l-p2)^]  [fa(u-a)2- 

(u-a)]+cp1(b)[l-51((a-bp)/(l-p2)^][^b(v-b)2- 
(v-b)]-Kp1(a)cp1((b-ap)/(l-p2)^)[-^>(u-a)2+^(u-a)x 
(v-b)+^(a-bp)(u-a)(r-p)/(l-p2)]/(l-p2)^+91(b)x 
qj1((a-bp)/(l-p2)^)  t-^p(v-b)2+^(u-a)(v-b)- 
^(b-ap  ) (v-b)  (r -p  ) / ( 1 -p2  ) ] /( 1-p2  )^+<p2  (a  ,b  ;p  )x 
t(r"p)+^(r-p)2(p+(a-bp)(b-ap)/(l-p2))/(l-p2) 
-$(a-bp(u-a)(r  -p)/(l-p2)-Kb-ap)(v-b)(r-p)/ 


(i-p2)]+r 


1 

in 


For  a = > b " $(«  22"®  33M®  22*®  ft'201  23^’ 

p - (a  gg -Of  2 3>/([«  22^  22+a  33‘2a  23^  and 


Ip  I < 1 


we  have 


(A.19)  (b-ap )/(l-p2)^  = ^22 (a  21*  33^^  22"  33^23^ 
(A. 20)  (a-bp)/(l-p2)^  = £(2a  22“  33"^  22“  p3-°f  33a  23)/ 

[(«  224®  33"2a  23)(a  22®  33 "^23^^ 

(a. 2l)  (1-p2)^  = <*\2(*  22-h^  33-2a  23)^/(a  22q-  33-a2 23^ 

(A.22)  p/(l-p2)^  = (or  gg -a  23  V (a  22®  33'®223^ 

(A.23)  (b-ap )/(l-p2)  = $ar  gg (a  22+a  33-20  23^(a  23-®  33)/ 


^®  22®  33^23) 

(A.2U)  (a-bp )/(l-p2)  - ^22(2or  22«  33^  22«  33 33*  23)/ 


(of  Of  -Of2  ) 

v 22  33  23 

(*•25)  (»-bp)(b-ap)/(l-p2)  - (l/4)«^(«  ^-a  13j)(2«  „ 


'“22“  23'“.33*-23)/(“.22'M'  33 


-2of  )*(cif  of  -Of2  ) 

.23'  V 22  33  23 


1 


-1Q!+  - 


(A. 26)  (p+(a-bp)(b-ap)/(l-p2))  = [a  22(a  £2+a  ^-2ot  2^/ 

22a  33^23^  Ua2S~*  23)/Cli22+(Vk)ai22 


(“  23"“  33,(2ff  22“  33"“  22“  23"“  33“  23)7 


(a  a -a2  ) 1 

v .22  33  23'J 


Next  we  are  going  to  compute  some  expectations, 
Consider  the  matrix  V defined  by 

(A.27)  (l/m)A  = I+(l/m^)v,  where  A — Wp(l,m) 


and  for  fixed  T^,  7]  we  define 


(a.28)  « 22  = 1£V  «33-^V  “23=^3 


Lemma  A.l  £V2  = (p+l)l 


Proof. 


By  definition  of  V, 


V = nr((l/m)A-l) 


= ( l/m*)  (A -ml ) 


V2  = (l/m)(A2-2mA+m2l) 


We  can  write  A as  A = Sz  Zl,  where  z's  are  identically 

i=l  1 1 

independently  distributed  as  Np(0,l). 


Then 
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] 


j 


U 

0 


0 


B 

0 

n 

iJ 

0 

D 

D 

B 


m m i* 

Aa  * ( 2 z.z!)(  2 z.z!)  - ‘4-  z.z'z, z!+  2 z.z.'z.z! 

i-i  1 1 j»iJ  J i-i  1 1 1 1 if  j 1 1 j j 
£A2  *»  m(p+2)l+m(ni-l)l  a m(nH-p+l)x 
CV2  = (l/m)(m(nHi>+l)l-2maI+m2l) 

= (p+l)l 

Lemma  A. 2 (i)  GOl^Hg)2  = 2(T\^\2)2  * 2a222 

(ii)  e(T)'VT)3)2  = 2o233 

Proof . By  definition, 

V ® (l/m^)(A-mX) 

(T)^)2  » (l/mJC^ATlg-m^)2 

- (l/m)(T^T]2)2(T^ATl2/71^T]2-o)2 

Since  ^y^T)2  — X2®  we  have 

e(*ngVTl2)a  - (l/m)(T|^1]2)a.2m  » 2(Tl^2)2  = 20^ 

Lanna  A.3  - 2(^3)*  - 2^ 

Proof . We  can  express  T)3  as 

1)3  - k\+« 


. — — . — 
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where  kTjg  is  the  orthogonal  projection  of  on  tp(T]2),  the 

space  generated  by  T)2>  and  e is  the  perpendicular  of  Tl^  with 
respect  to  cpfiy . The  coefficient  k is  known  as  T^/T^g. 
Then 

(A.29)  T£vy^VT)3  = ^^(k^+ej^k^+e) 

= k2(^V7)2)2+2kTl^Vll2T]^Ve+Tl^ee  'Ve 
Now  V = (l/n£)(A-ml),  A ~Wp(l,ra). 

Since  T|2  and  e are  mutually  orthogonal  we  can  choose  an  or- 
thogonal matrix  L with  first  row  as 

lU/WU)*,  8econd  TXM 

2 2 2 

as  e'/fe'e)  , and  define  A*  as 

A*  = LAL',  A*  is  again  distributed  as  Wp(l,m).  Then  the 
(l,l)th,  (l,2)th,  (2,2)th  elements  of  A*  are 

(A.30)  afr  = T1^AT|2/(^712),  a*2  = 

*22  “ e'Ae/(e'e) 

and  *22  are  indePendent  because  A*  — Wp(l,m) . Therefore 
H'vn2  and  e'Ve.  Consequently, 

ej]^n\2e'Ve  = 0. 

Furthermore,  we  can  write 

*tl  - £*ll'  ‘h  - £*11*12' 


where  Z^'s  are 


I 


f 

) 


D 

0 

u 
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Identically  independently  distributed  as  N(0,l),  i » 1,...,  m, 
J ■ 1»2» •••• 


^fl3^  ■ °*  which  entails  d^Aiy^Ae  * 0.  Hence 
(A.31)  tflgVTl^Ve  = Cl]^((l/m^)(A-«nl))iyi'((l/n^)(A-ml))e 


J2 


( l/m)fi(tl^n2^Ae-mTl^2n^Ae  ) 


LJ 


0 
c 
0 
0 
0 
D 

1 


because  fiA  « I,  and  l^e  » 0.  Thus  from  (a.29),  we  have 

«^v7)2n*vn3  » k^vy*  . k2  2(i^ti2)2  . (ti^n  /t^h2)=. 


= 2(^n3)2  - a»223 

AJ*  <0  e^vyi^  . 2(i|^)(i£il)  . a,  22« 


23 


(11)  C1^V113T|'VT)3  = 2(T|^3)(ll't|3)  . 2»  33«  23 

Proof  • As  in  Lemma  A *3  we  can  write 

A 

Tl^yigV^  - ^TUa^v(Mis,+.)  . k(7]^y2+T^vyi,;v. 
e^^vrij  . k-adi^)^  - a(i|^)2 


i 
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- 2(^V(T>2V  = 2a  22“  23 

Lemma  A.$  efll,^)2  = (T1^3)2+(T1^2)(T1'T13)  = C^-H*  ^ ^ 
Proof . (T^VT)3)2  = [T]^V(k712+e)]2  = (kT^VT^Ve)2 

= k2(^VTl2)2+2kTl^\n]2^Ve+(Tl^Ve)2 
T)^Ve  a m^(A-ml)e  = m^^Ae 

« algCTj^^e'e)^  (see  A. 3°) 
a ■_*0l^l2)^(e,e)^(  S zilzi2)#  Ztj  — N(O.l) 


and 


e'e  = 0l3-W)2)'(Tl3-kTl2)  = ^3^3“  (^”^5  J2/7!^ 


Hence 


e(T|^Ve)2 


(l/m)T1^712(T|*Tl3.Tl^Tl3/T|^2)e(  S Z^Z*^ 


IjWhV 


7|27,27137,3-(7127,3)a 


Thus 


0 

w 

u 

u 


Q 

0 

[] 


[ 

I: 

i 

I 

0 
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ccn^vjn3)a  = k2e(T^VTl2)2+e(T^Ve)a 

» Ol^3/^T)2)2  2(T1^n2)a+Tl’Tl2Tl3Tl3-(Tl^3)2 

- WjVVV*  - “,8J'“  .22"  33 

From  the  above  lemmas  and  the  fact  that  ^i»^2»^3>  an<*  ^ 
are  Independent  and  Yt  — Np(0, (m/n)l)  1=1,  2,  3...,  SV  - 0. 
Then  for  sufficiently  large  m,  n and  m/n  = 3,  we  are  now 
able  to  compute  the  expectations  of  the  functions  defined  In 
Chapter  One. 

(a. 32)  eoCY^.v)  = 4(p-i)(^22+3a"22) 

(A. 33)  eG(Yg,Y3>V)  = -Kp-1)(«  gg -Of  33)(a  ^-20?  X 

(^■3/ (a  22+«  33 -2a  23)) 

(A.34)  CK(Y1,Y2,Y3,V)  = (a  22(a  22-kx  ^-acr  23))^[3(p-2)-3(p-Dx 

(“.22 23)/a  22_3(p"1^*  22"“  23^ 

(“  22‘Wf  33_2af  23)_2(Qf.22-°f.23)+ 

(a  22"®  23^2(3+2(a  52~a  23^^ 


(*.22(“224“  SS’2®  23))J 
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(A. 35)  eca(Y1,Y, 
(A. 36)  eF2(YrY; 

(a. 37)  ch^y^y 


(A.38)  ec(YlfY2 


>)  » 3/2 


,.v)  = [(or  22»  33^  23)a+3(“  22-"  33)]/ 


<°  22W  33-a’'  23)-(3/2)(0'  28-“  33)2/ 


<“  aa"  .33"20  23)a 


:,y3,»)  = 21-4(0.  22^  g) 


22-°'  2i)Va  22-3<“  22-“  23)2/ 


(or  2a«r  33-20,  23)+3(o,  22+or  33-20  ^j 


+3*  22+2Qf  22(“  22W  33-^  231 
"2 (or  ^ -o>  23)3(3+2(or  22-“  23))7 


<“  22(“  22«  33-201  23))I/(*  22 


(a  _-+a  „ -2a  „„)) 
22  33  23 


F(Y2,Y3,V)  . (3/MI20,  22-(«.22^  ^ 23V 


<«  22"“  33-2"23,)/[0'  22 


(or  «2*or  „-20r  — )]* 
22  33  23,J 


(A. 39)  ec(Y1,Y2)H(Y1,Y2,Y3,v)  - (3/2)(a  22«  33-^23)/ 


[a,22<c'.22«33-a'23>l3/2 


(A.40)  eF(Y2,Y3,V)H(Y1,Y2,Y3>V) 


■ 22*  33‘“223)(110'  22+5“  33'1"  -23)/["^(“  22“  .33 


-3*  23)2] 

(a .41)  eo^.Yg)  = (l/8)(2|H.l)^22+3(p-l)/2a^22 

(A.42)  eG*(Y1.Y2>Y3,V)  - (o  22-0f  33)[-(2p+l)/8-(3/2)(p-l)/ 

(C'22«,33-a0,.23)]/(*.22',O.33-a'23)i 
(A.43)  CK*(Y1,Y2,Y3,V)  = [a  22 (0  22W  33-2a  23 ) ] “^[ -l2-3(p-l)  X 


22"*  23^“  224®  33*20f  23^“ 


(3(p-l)(or  gg -a  23)+8a  23)/cr 


(ff  .22“°  23)2(a  22-®  23+6)/[®  22 


(a  +cr  -2a  „„)]] 
22  33  23 


(A.44)  C[C*(Y1,Y2,V)]2  . a 22/8+3/2 
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(A.45)  C[P*(Y1,Y2,v)]2  = C(o  22-ar  33)2+8(«  22«  33-ora23) 


+U8(“  22".33‘"£3>]/8(c'  22"  33-2“  23* 


— 3 ( Qf  -or  )s/2(ot  +ct  -2a  )s 

31  22  33  -22  -33  .23 


(AA6)  e[H*(Y1,Y2,Y3,v)]2  = t-(»  22-a  23)2+6(or 


a 22-6^  22“*  23)2/(“  22^  33_2<*  23^ 


+^“  22  23^°'  -22"“  23+6^^  22  X 


^ 22^  SS-2*  23^^°'  22  ^ 22^  33 


-2a  23) 


(A.Vr)  cc*(y1,y2,v)f*(y1,y2,y3,v) 


-[2(Q^22-aa23)+12(2a  22-or  23)+(«  22-a  33)(«  22-«  23) 
(«  22"  23+6)/(#  22"  33 ^ 23>,/t8“i22(“  22*  33 

-»  2j)^l 


(A. 48)  ec*(Y1,Y2,V)H*(Y1,Y2,Y3,V) 


’ <«  22"  23+6)[1’(“  22"  23)S/*  22(“  22"  33-“  23*1' 


“ U3  * 


II 

0 

Wo  22W  33'“  23)4] 

(aA9)  cf*(y1,y2,y3,v)h*(y1,y2,y3,v) 

■ [-^22W  22“  23+*  .22“  33"  33"  23'IWV<i(“  22+3"  33* 
+“  23(°  22^  23,(“  23+6)/"  22+12(“  22“  33,("  22^  23>7 
22-"  33'“  23)+("  22“  33K“  22-"  23)2(0r  22^  23+6)/ 
(a.22<“  22«  33'“  23))1/[^22(“  22**  ^ 23»- 


. 


- 114  - 


I 

T 

* 


E 

I! 


0 

0 


D 


0 


0 

D 


REFERENCES 


[1]  Anderson,  T.W.  (1958).  An  Introduction  to  Multivariate 

Statistical  Analysis.  John  Wiley,  New  York. 

[2]  Anderson,  T.W.  (1966).  Some  nonparametric  multivariate 

procedures  based  on  statistically  equivalent  blocks. 
Proc.  1st  Internat.  Symp.  Mult.  Analysis,  5-27. 

[3]  Anderson,  T.W.  (1973)*  An  asymptotic  expansion  of  the 

distribution  of  the  studentized  classification 
statistic  W.  Ann.  Statist.  1_,  964-972. 

[4]  Berk,  R.H.  (1966).  Limiting  behavior  of  posterior  dis- 

tributions when  the  model  is  incorrect.  Ann.  Math. 
Statist.  2lL>  51-58. 

[5]  Bickel,  P.J.  and  Yahav,  J.A.  (1968).  Asymptotically 

optimal  Bayes  and  minimax  procedures  in  sequential 
estimation.  Ann.  Math.  Statist.  39,  442-456. 

[6]  Chanda,  K.C.  and  Lee,  J.C.  (1975)*  A class  of  nonpara- 

metric classification  rules.  To  appear  in  Some 
Statistical  Methods  Useful  in  Oil  Exploration. 

Ed.  D.  B.  Owen. 

[7]  Chow,  Y.S.  and  Robbins,  H.  (1965).  On  the  asymptotic 

theory  of  fixed  width  sequential  confidence  intervals 
for  the  mean.  Ann.  Math.  Statist.  36,  457-462. 

[8]  Cover,  T.M.  (1968).  Estimation  using  the  nearest  neighbor 

rule.  IEEE  Trans.  Inform.  Theory,  IT- 14,  5O-57. 

[9]  Cover,  T.M.  and  Hart,  P.E.  (1967).  Nearest  neighbor 

pattern  classification.  IEEE  Trans.  Inform.  Theory, 
IT-1^,  21-26. 

[10]  Das  Gupta,  S.  (19&).  Nonparametric  classification  rules. 

Sankhya  A,  26,  25-30* 

[11]  Das  Gupta,  S.  and  Kinderman,  A.  (19J4).  Clasaifiability 

and  designs  for  sampling.  Sankhya  A,  .36,  237-250. 

[12]  Fix,  E.  and  Hodges,  J.L.  (l95l)«  Nonparametric  discrimi- 

nation: Consistency  Properties.  U.  S.  Air  Force  School 
of  Aviation  Medicine.  Report  No.  4,  Randolph  Field. 
Texas.  * 


4 


- 115  - 


[13]  Fraser,  D.A.S.  (1957)*  Nonparametric  Methods  In 

Statistics.  John  Wiley,  New  York. 

[14]  Goldstein,  Matthew  (I962).  ^-nearest  neighbor 

classification.  IEEE  Trans.  Inform.  Theory.  IT-18. 
627-630.  * = = 

[15]  Grams,  W.F.  and  Serfling,  R.J.  (1973).  Convergence  rates 

for  U-statistics  and  related  statistics.  Ann.  Statist., 
1,  153-160.  

[16]  Hoeffding,  W.  (1948).  A class  of  statistics  with 

asymptotically  normal  distribution.  Ann.  Math.  Statist., 

i2.  293-325. 

[17]  Hoeffding,  W.  (1961).  The  strong  law  of  large  numbers 

for  U-statistics.  Institute  of  Statistics  Mimeo  Series 
No.  302,  University  of  North  Carolina,  Chapel  Hill,  N.C. 

[18]  Hoeffding,  W.  (1963).  Probability  inequality  for  the 

sums  of  bounded  random  variables . Jour.  Amer.  Statist. 
Assoc « , 58,  13-30. 

[19]  Hoeffding,  W.  and  Wolfowitz,  J.  (1958). 

Distinguis liability  of  sets  of  distributions.  Ann.  Math. 
Statist.,  22,  700-718. 

[20]  Hudimoto,  H.  (1964).  On  a distribution-free  two-way 

classification.  Ann.  Inst.  Statist.  Math.,  2»  31  -36. 

[21]  Lehmann,  E.L.  (1951).  Consistency  and  unbiasedness  of 

certain  nonparametric  tests.  Ann.  Math.  Statist..  22. 

165-179.  “ 

[22]  Loftsgaarden,  D.O.  and  Quesenberry,  C.P.  (1965).  A 

nonparametric  estimate  of  a multivariate  density 
function.  Ann.  Math.  Statist.,  36,  1049-1051. 

[23l  Sen,  P.K.  (i960).  On  some  convergence  properties  of 
U-statlstlcs . Bull.  Calcutta  Statist.  Assoc..  10. 

1-18.  — = 

[24]  Simons,  G.  (1968).  On  the  cost  of  not  knowing  the 

variance  when  making  a fixed-width  confidence  interval 
for  the  mean.  Ann.  Math.  Statist.,  1946-1952. 

[25]  Srivastava,  M.S.  (1973).  A sequential  approach  to 

classification:  Cost  of  now  knowing  the  covariance 
■atrix.  Ann.  Mult.  Analysis.  I73-I83. 


Wldder,  D.V.  (1961).  Advanced  Calculus . 2nd  Edition, 
Prentice-Hall,  Inc.,  N.  J. 


[26] 


SECURITY  CLASSIFICATION  OF  THIS  PACE  (Whti  Dm,.  Bntarad) 


REPORT  DOCUMENTATION  PAGE 


Technical  Report  No.  276  1/ 


4.  TITLE  ( and  Submit)  


On  some  Aspects  of  the  Classification  Problem, 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


7.  AUTHORO) 


LHsien  Elsa /Lin 


EREORMING  organization  name  ano  aooress 

Department  of  Theoretical  Statistics 
University  of  Minnesota 


M.  CONTROLLING  OFFICE  NAME  ANO  ADD 

U.  C.  Army  Research  Office  l 

Post  Office  Box  12211  number  or  pa 

Research  Triangle  Park.  NC  27709  H6 


14.  MONITORING  AGENCY  NAME  * AODRESV**  different  from  Controlling  Olllco)  I 18.  SECURITY  CLASS,  (of  thlo  report) 


m — 

Tm& 


Unclassified 


TRIBOTION  STATEMENT  (al  dila  Kaparl) 


Approved  for  public  release;  distribution  unlimited. 


IT.  DISTRIBUTION  STATEMENT  (a!  tha  abstract  anlarad  In  Black  10,  II  dlllaranl  tram  Kapart) 


It.  SUPRLEMENTARV  NOTES 

The  findings  in  this  report  are  not  to  be  construed  as  an  official 
Department  of  the  Army  position,  unless  so  designated  by  other  authorized 
documents  ■ 

19.  KEY  WORDS  (Continue  on  roooroo  oldo  It  noeooomry  and  Idontlty  by  block  number) 

Classification,  Multivariate  Normal,  probability  of  misclassif ication, 
asymptotic  expansion,  nearest  neighbor  rules,  sequential  rules,  U-statistics. 


| SO.  ABSTRACT  on  rororoo  oldo  II  noeooomry  and  Identify  by  block  number) 

ifferent  rules  for  the  problem  of  classifying  one  (or  more)  unit 
to  one  of  several  distinct  populations  on  the  basis  of  random  training 
samples  are  considered  and  their  asymptotic  probabilities  of  misclas- 
sification  (PMC)  are  derived. 

First,  we  consider  the  pj^JliTSnr  of  classifying  one  unit  to  one  of 
three  distinct  multivariate  normal  distributions  with  a common  covari- 
ance matrix.  A plug-in  rule  is  obtained  by  replacing  the  unknown  para- 
meters by  their  estimates  in  the  optimal  classification  rule.  Following 
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T.  W.  Anderson,  we  obtain  the  asymptotic  expansions  of  the  PMC's  and  the 
estimated  PMC's  of  this  plug-in  rule. 

In  the  second  chapter,  we  consider  the  problem  of  classifying  one 
unit  to  one  of  two  distinct  populations  with^completely  unknown  distri- 
bution functions.  The  usual  nearest  neighbor  (NN)  rules  can't  be  applied 
if  the  observations  are  available  only  in  their  relative  orders  or  ranks. 
Using  the  basic  ideas  of  NN  rules,  we  propose  some  rules  expressed  in 
terms  of  their  ranks  and  derive  the  asymptotic  PMC's  of  the  rules. 

In  the  third  chapter,  rules  based  on  U-statistics  are  suggested  and 
the  asymptotic  PMC  of  the  rules  are  obtained  together  with  the  rate  of 
convergence  as  the  sizes  of  the  training  samples  approach  infinity. 

Finally,  we  consider  sequential  rules  based  on  U-statistics  in 
order  to  control  the  PMC  uniformly  and  arbitrarily.  The  moment  generating 
function  is  shown  to  exist.  The  proof  of  asymptotic  properties  of  the 
sequential  rules  suggested  by  Srivastava  for  the  classification  into  one 
of  two  multivariate  normal  distributions  is  rigorized  and  corrected. 
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